Double robustness

Suppose we are interested in estimating the average treatment effect (ATE), defined in potential-outcome notation as

\tau = E(E(Y^1 \mid X) - E(Y^0 \mid X))

where the outer expectation is over X. Assuming strong ignorability, so that

E(Y \mid D = 1, X) = E(Y^1 \mid X)

and

E(Y \mid D = 0, X) = E(Y^0 \mid X),

the ATE can be written as

\tau = E(E(Y \mid D = 1, X) - E(Y \mid D = 0, X))

where the outer expectation is over covariates X. Likewise, recall that the ATE can also be written as

\tau = E\left(\frac{YD}{E(D =1 \mid X)} - \frac{Y(1-D)}{E(D =0 \mid X)}\right)

where the outer expectation is over the pair (D,X).

Now, consider the estimand

E(f(X)) + E\left(\frac{YD}{g(X)}\right) - E\left(\frac{f(X)D}{g(X)}\right)

where the expectation is taken over the joint distribution of (Y,D,X) and f(\cdot) and g(\cdot) are fixed functions. We will consider what happens for particular specifications of f(\cdot) and g(\cdot). In particular, we will consider two cases.

Suppose f(X) = E(Y \mid D = 1, X). In this case, our estimand becomes

E(E(Y \mid D = 1, X)) + E\left(\frac{YD}{g(X)}\right) - E\left(\frac{E(Y \mid D = 1, X)D}{g(X)}\right).

By iterated expectation, the middle term can be rewritten as

E\left(\frac{E(Y \mid D = 1, X)D}{g(X)}\right),

which we see will cancel with the third term, leaving only the first term, which is equivalent to E(E(Y^1 \mid X)).

Suppose g(X) = E(D = 1 \mid  X). In this case, our estimand becomes

E(f(X)) + E\left(\frac{YD}{E(D = 1 \mid  X)}\right) - E\left(\frac{f(X)D}{E(D = 1 \mid  X)}\right).

By iterated expectation, the third term becomes

E\left(\frac{f(X)E(D=1 \mid X)}{E(D = 1 \mid  X)}\right) =E(f(X)),

which cancels with the first term, leaving only the second term, which in this case is equivalent to E(E(Y^1 \mid X)).

If both of the above conditions hold, one gets the same result — that this estimand is equivalent to E(E(Y^1 \mid X)) — but only one of them is necessary. Applying similar reasoning to E(E(Y^0 \mid X)) allows us to estimate the ATE.

If neither f(X) = E(Y \mid D = 1, X) nor g(X) = E(D = 1 \mid  X), then one is of course simply out of luck (i.e., won’t be able to estimate ATE via this estimand).

 

Leave a comment