Does IV always identify LATE?

Posted on Mon 02 March 2015 in blog

$\newcommand{\E}{\mathbb{E}}$ $\newcommand{\P}{\mathbb{P}}$

Local average treatment effect (LATE)

Instrumental variables are often used in causal analysis when randomized control trials are out of option. However, it is not always emphasized that the instrumental variable estimator -- even if the instrument is valid and relevant -- does not necessarily identify the average treatment effect (ATE) or the average treatment effect on the treated (ATET); most often, it only identifies the local average treatment effect (LATE): the average treatment effect on the complier subpopulation (see Angrist&Imbens, 1994).

Who are the compliers? Unfortunately, we do not know. These are the people who are induced into the treatment by the instrumental variable. But from the data it is impossible to tell who these people really are.

Let us take the standard example of a job training program for unemployed. The treatment is binary (participating in the program or not) and participants are assigned to the treatment randomly. However, assignment does not fit perfectly with actual participation: there are people who take up the treatment even when non-assigned and other people who are assigned but not take the treatment. That is, there is some selection into the treatment. In this case -- when certain assumptions hold -- one can use the assignment variable as instrument to identify LATE, the average treatment effect on the complier population. In other words, we can get the treatment effect on those who take up the treatment because they are assigned and would not take the treatment had they not been assigned. This is a useful result even if we are unable to tell who these people are. But we should keep in mind that this result only applies to a subpopulation and that this subpopulation is likely to change when using another instrument.

Why this fact is often neglected? Because there is a tradition to assume homogeneous treatment effect, that the effect of the program is the same for everyone. In this case, the estimated effect for the complier subpopulation is the same as the effect for the whole population. But most often we cannot be sure that this assumption holds. If there is nothing which can rule out a heterogeneous treatment effect, LATE usually differs from ATE.

Statistical intuition

What is the statistical intuition behind this result? In causal analysis we generally compare outcomes of certain groups who differ in terms of the extent they get a treatment. Taking the previous example we would like to compare the outcomes of the group who get the treatment to the group who do not. We relate the variation in the outcome to the variation in the treatment. In standard regression framework, we relate variation in $y$ to variation in $x$, and see whether there is some correlation between them. However, as correlation does not imply causation we should make sure that the correlation what we measure is only due to $x$, and not due to a third variable.

Assume that our previous example is about a job training program for unemployed. Even if the program is randomized, people may -- at least to some extent -- have the choice whether they take the treatment. In such cases we usually observe that a third (unobservable) variable (let us call it ability), drives part of the variation in both $x$ and $y$: more able people take the treatment more likely and also find a job more likely, irrespectively of any program. Therefore, a simple correlation would show a larger "effect" as it would take up the correlation generated by the third variable as well, not just the correlation which goes from $x$ to $y$.

An instrumental variable enables us to use only a part of variation in $x$ which is independent of the third variable and compare only this part of variation to the variation in $y$. This way we can get a correlation which indeed measures the part of correlation between $y$ and $x$ that is due only to $x$. In other words, we can get the causal effect of $x$ on $y$. This is how two stage least squares estimation works. We just need a variable which is independent of $y$ and correlated with $x$. That is, a variable which influences $y$ only through $x$. In case of the job training program, assignment was truly random so independent of $y$. Using the assignment variable as instrument we can keep the variation in the actual participation in the treatment which is due to the assignment (and therefore, random) and partial out the rest which could be due to a third variable (which is also correlated with the outcome as well). Relating this part of the variation in the treatment to the variation in the outcome we can get the effect of the treatment on the outcome -- but be careful how to interpret this effect.

As only part of the variation in $x$ (treatment) was used, we get the effect only for this part of the population. What is the population which is defined by the variation in $x$ due the instrument? The complier subpopulation. Those who participated in the treatment because they were assigned.

I used a simple example of a program to illustrate my points. However, this reasoning applies to every setup when instrumental variables are used. Card (1995) tries to estimate the returns to schooling and uses proximity to college as an instrument. To use this instrument we should assume that proximity to a college only influences future wages through schooling as those who live closer to a college are more likely to go to a college. Let us maintain this assumption. Then the IV estimator identifies LATE: the returns to schooling on the complier population. That is the effect on those who went to college because they were living close to a college (and would not have gone to college had they not lived close).

Program evaluation language

Thinking about causal effects is easier when we use the potential outcome framework by Donald Rubin. It builds on the idea that the outcome of each individual depends on her treatment status: if she gets the treatment she will realize an outcome $Y(1)$, whereas if she does not get the treatment she will realize a potentially different outcome $Y(0)$. These are called potential outcomes as by definition only one of them will be realized. A person either gets the treatment (and realizes her potential outcome with the treatment), or does not get the treatment (and realizes her potential outcome without the treatment).

The same logic can be applied to the treatment status which is influenced by the instrument. Considering the assignment to treatment used as instrument, one can be either assigned or not. For an assigned person we can observe whether she participates, but do not know what would she do had she been not assigned. Her potential treatment status given the different values of the instrument can be denoted by $D(0)$ and $D(1)$. In this framework, the compliers are those for whom $D(1) - D(0) = 1$: those who participate because the assignment, they participate if assigned $(D(1) = 1)$, but do not participate if not assigned $(D(0) = 0)$. In principle, there could be three other types of people: always-takers who take the treatment irrespective of their assignment $(D(1) = D(0) = 1)$, never takers who do not take the treatment irrespective of their assignment $(D(1) =D(0) = 0)$ and defiers who participate when not assigned but do not participate when assigned. Now we can formally give the assumptions which are needed for the identification of LATE (I denote the instrument by $Z$).

$Z \perp D(0), D(1), Y(0), Y(1)$ -- The instrument is independent of the potential outcomes ("as good as random"). The instrument is valid.
$\P[D=1 | Z=1] \neq \P[D=1 | Z = 0]$ -- The instrument does influence the treatment status, that is there are at least some compliers. The instrument is relevant.
$D(1) \geq D(0)$ -- There are no defiers.

The first two assumptions are standard IV assumptions. The third one is new but usually a much less restrictive assumption than the homogeneity of the treatment effect. If these assumptions hold, one can derive that the classical IV estimator identifies LATE. In our training example where the instrument is binary the resulting estimator collapses to the so called Wald estimator:

$$LATE = \E[Y(1) - Y(0)|D(1) - D(0) = 1] = \frac{\E[Y|Z=1] - \E[Y|Z=0]}{\P[D=1 | Z=1] - \P[D=1 | Z=0]}$$

Intuitively, this formula says the following. There are two subgroups of people: those who were assigned $(Z = 1)$ and those who were not $(Z = 0)$. As the assignment is random regarding the potential outcomes two results follow: (1) both subgroups have the same distribution of types regarding their potential treatments (compliers, always-takers and never-takers), and (2) both subgroups have the same distribution of potential outcomes. Therefore, if we see any difference between the outcomes of the subgroups, this difference could be only due to the fact that in the assigned group compliers realized $Y(1)$ and in the non-assigned group compliers realized $Y(0)$. That is the difference in the average outcomes shows the effect of the program on the compliers. One should just adjust by the ratio of compliers to get LATE.

The program evaluation language extends naturally to other setups as well. Let us consider the example of returns to schooling. $Y(1)$ expresses the potential wage of the individual if she goes to college and $Y(0)$ stands for the potential wage of the individual if she does not go to college. $D(1)$ says whether the individual goes to college if she lives close to a college and $D(0)$ expresses whether the individual goes to college if she lives far from a college. The IV estimator gives us the returns to schooling for those who went to college because of living close to a college.

Long story short

Classical IV estimator most often identifies LATE: average treatment effect on the complier population. Complier population consists of those who take up the treatment because of the instrument.
If treatment effect is heterogeneous, LATE differs from ATE or ATET.
Statistical intuition: When using IV, we partial out variation in the dependent variable $x$ which is not due to the instrument $z$. So any correlation between the outcome $y$ and the partialled out $x$ shows the correlation for those units where $x$ varies because of $z$.
The potential outcomes framework makes thinking about causality easier.
As instrument is random regarding the potential outcomes, the subgroups defined by the instrument should be the same in terms of their potential outcome distributions.
If we see any difference in the average outcomes of the subgroups, this could be due only to different take up rates, showing the effect of the program. One should only scale by the difference of the take up rates to get LATE.