Bandits can trick you about your outcomes

Posted on Wed 12 February 2020 in blog • 1 min read

My story is about multi-armed bandits and why data collected with bandits lead to underestimated means. If you work with online data (webshops, ads, mobile apps, etc.), using multi-armed bandit algorithms can speed up your testing process. However, measuring the outcomes of the different versions is not that straight-forward. This post is published at Emarsys Craftlab, you can read it there.


Continue reading

(In)significant thoughts

Posted on Fri 07 September 2018 in blog • 1 min read

In the age of big data, everyone had heard sentences about something being statistically significant or insignificant. But significantly less people are able to explain what it truly means. Research shows that even people with formal statistical training misinterpret statistical significance. This post published at Emarsys Craftlab is my (in)significant attempt to clear things up.


Continue reading

How to select controls if you are interested in a causal effect?

Posted on Thu 06 July 2017 in blog • 6 min read

The standard practice of deciding about controls — running a t-test on the potential control — is very dangerous. More specifically, it leads to biased estimates and too optimistic standard errors. We can avoid these problems by taking into account the correlation between the treatment and the control variables as well (so called double-selection).


Continue reading

Which binary classification model is better?

Posted on Wed 27 May 2015 in blog • 2 min read

Receiver Operating Characteristic curve is a great tool to visually illustrate the performance of a binary classifier. It plots the true positive rate (TPR) or the sensitivity against the false positive rate (FPR) or 1 - specificity. Usually, the algorithm gives you a probability (e.g. simple logistic regresssion), so for …


Continue reading

Does IV always identify LATE?

Posted on Mon 02 March 2015 in blog • 8 min read

Instrumental variables are often used in causal analysis when randomized control trials are out of option. However, it is not always emphasized that the instrumental variable estimator — even if the instrument is valid and relevant — does not necessarily identify the average treatment effect (ATE) or the average treatment effect on the treated (ATET); most often, it only identifies the local average treatment effect (LATE): the average treatment effect on the complier subpopulation (see Angrist&Imbens, 1994).


Continue reading

Command Line Tricks: How to count lines with date before a certain date?

Posted on Fri 27 February 2015 in blog • 1 min read

Assume I have a file with downloaded articles with a similar structure as follows:

head -2 file-with-dates.csv
title,url,tag,date
"title of the article",http://www.url-of-the-article.com,"tag1,etc",2015-02-27

How could I calculate the number of articles posted before a certain date without having to load …


Continue reading