In class yesterday we reviewed a few key concepts, which I’m going to revisit here for posterity.
Also, here is a link to a nice monograph on regression discontinuity designs (RDD). Please read chapters 1 and 2 before class tomorrow (a total length of 17 pages).
The first agenda item is to consider in more detail the proof of Theorem 1 in Angrist and Imbens (1991). The theorem shows that the average effect of the treatment on the treated (ATT) is identified if two assumptions are satisfied. The proof is elementary, but the presentation in the working paper is extremely telegraphic. Here we fill in details. The two conditions are:
- there exists an instrumental variable
such that
for all
, (exclusion restriction)
. (eligibility instrument)
The first condition obtains, for example, if arises independently of any of the factors affecting the outcome (other than the treatment assignment) and prior to the treatment assignment
. The second condition can be interpreted as saying that
denotes eligibility to receive treatment.
First, we must show that , (a fact that is merely stated in the paper). This claim requires both assumptions. To show it, we write the observed outcome as
and then take expectations conditional on
:
.
The first term on the right becomes by the first assumption. The second term can be written with iterated expectation as
by the second assumption.
Next, we consider , taking the same approach as above, but with
this time:
.
As before, the first term is by the first assumption. And, by the first part above, is equal to
(an estimable quantity). Iterated expectation on the second term gives
.
Next, we recognize that we can substitute
by the second assumption: restricting to situations where and
is exactly restricting to situations where just
, because all of these have
by assumption.
Putting all these pieces together allows us to solve for the desired treatment effect as
The next item is to look at identification of the treatment effect in linear instrumental variables models. For simplicity, we will consider a continuous treatment and a continuous instrument. In this case our structural equation representation is
where are all mutually independent. The shared dependence on
is what makes direct regression on
inappropriate for determining the treatment effect.
However, observe that by substituting the equation for into the equation for
yields
.
But here we can recognize that the “error term”
is now independent of , meaning that regression of
on
will yield an estimate of
. A regression of
on
will yield an estimate of
and
can be obtained as a ratio.
Likewise, if we had in hand, we could regress
on
to obtain an estimate of
. This approach, using a “first stage” estimate of
, is called “two-stage least squares” or 2SLS.
Here is an R script briefly demonstrating these ideas in action.
Finally, we took a look at how to think about the do-operator, as distinct from “vanilla” conditioning. The basic insight is that conditioning is “filtering”, while “do-ing” is “short-circuiting”.
In more detail, if you think about the data generating process as an algorithm that takes stochastic inputs, conditioning proceeds by generating all of the outputs and then simply restricting the output to those realizations satisfying a particular condition (hence “conditioning”).
The do-operator, by contrast, generates all of the data and then replaces all of the draws of the one variable and fixes them to a certain value and then re-generates any variables that depend (structural/causally) on the variable you replaced.
Here is a script that briefly demonstrates this idea in action.
