# Is this estimand really an average treatment effect?

**Update ** See the preprint
here based on this blog post.

## Background

My Stanford office-mate and good friend
Josh Loftus messaged me
about an
interesting debate about selection bias in observational
studies of racially biased policing. The key issue here is that to
understand racial discrimination in police use of force (or
prosecutions), one must bear in mind we only have data for those who
were stopped (or arrested) by the police. If there is unmeasured
confounding between stopping and use of force (which seems quite
likely to me), we could incur collider bias by conditioning on the
individuals being stopped. This important point is brought up by a
group of political scientists in Princeton in
a paper recently
published by the *American Political Science Review*.

However, another group of researchers in Stanford and NYU arXived a fairly harsh criticism of the Princeton paper this past week, pointing to a “mathematical error” in the original paper. The key argument is well illustrated by a diagram in Theorem 9 of the paper. Briefly speaking, the Princeton paper claimed that several assumptions, including treatment ignorability, mediator ignorability, and mediator monotonicity, are necessary for the identification of a controlled direct effect; but in fact, they only showed that is sufficient. The new paper shows that a weaker assumption called subset ignorability is sufficient for the identification. This lead to some heated exchanges on Twitter, perhaps somewhat fueled by the recent political events.

So Josh asked for my opinion about this debate. After briefly reading both papers and discussing with him, I largely agree with him that the criticism is valid but misplaced. It is true that, strictly in terms of the mathematics, the necessity claim by the Princeton is incorrect. However, if we put this discussion in the context of racially biased policing, it seems that subset ignorability is not really any weaker than treatment ignorability and mediator ignorability. By that I mean it is difficult to imagine a scenario in this application where subset ignorability holds but the stronger assumptions don’t.

I suspect the fundamental reason for the confusion here is the translation of causal graphical models to counterfactuals/potential outcomes. Counterfactual is a much richer language than graphs to describe causality, and people may interpret the same causal graph differently (see this paper for some possibilities). Moreover, it is often the case that identification of certain causal quantities only requires a subset of the counterfactual independence assumptions implied by the graphical model. So the minimal counterfactual assumption may often be mathematically weaker than the graphical assumption, but that may not translate to materially different interpretations in practice.

## An interesting causal (?) estimand

The point of this post, however, is not about the plausibility of the identification assumptions and how to state them in an empirical paper. That can be the subject of another post. I am writing this post because I find an estimand in the Princeton paper quite interesting. Much of that paper is about bounding and identifying the quantity below (equation 4 of the paper):

\[ \text{ATE}_{M=1} = \mathbb{E}[Y(1, M(1)) \mid M = 1] - \mathbb{E}[Y(0, M(0)) \mid M = 1]. \]

The authors refer to this as an (average treatment) “effect among those stopped by police”, most likely due to its similar form with the average treatment effect (equation 3):

\[ \text{ATE} = \mathbb{E}[Y(1, M(1))] - \mathbb{E}[Y(0,M(0))]. \]

Notice that \(M\) is a mediator in this problem (see below for the causal model). By recursive substitution (or consistency of counterfactuals), we have \(Y(1, M(1)) = Y(1)\) and \(Y(0, M(0)) = Y(0)\). So the ATE above reduces to the definition that we normally see in the causal inference literature.

However, \(\text{ATE}_{M=1}\) conditions on a post-treatment variable \(M\), so it is not a usual conditional average treatment effect that appears when investigating effect modification. So the question I would like to answer in this post is: can we still interpret \(\text{ATE}_{M=1}\) as an average treatment effect as usual?

## The causal model

It may be helpful to review the causal model being considered here. Let \(D\) be the treatment (race), \(M\) be the mediator (stop), and \(Y\) be the outcome (use of force). The Princeton paper is generally interested in the case where there is unmeasured confounder between \(M\) and \(Y\) (so conditioning on \(M\) induces collider bias). But for the purpose of understanding \(\text{ATE}_{M=1}\), let’s consider the simplest case where is no unmeasured confounder whatsoever. The relationship of the three variables can be described by the simple diagram below:

```
graph LR;
D-->M;
M-->Y;
D-->Y;
```

In the racially biased policing problem, all the variables here are binary. \(D=0\) means that the person is white (\(D=1\) is black), \(M=1\) means that the person is stopped by the police, and \(Y=1\) means that the police uses force.

## Expressing \(\text{ATE}_{M=1}\) as principal strata effects

By the law of total expectation, we have

\begin{align*} \text{ATE}_{M=1} &= \mathbb{E}[Y(1) - Y(0) \mid M = 1] \\ &= \sum_{m_0=0}^1 \sum_{m_1 = 0}^1 \mathbb{E}[Y(1) - Y(0) \mid M = 1, M(0) = m_0, M(1) = m_1] \\ & \qquad \qquad \quad \cdot \mathbb{P}(M(0) = m_0, M(1) = m_1 \mid M = 1) \\ &= \sum_{m_0=0}^1 \sum_{m_1 = 0}^1 \mathbb{E}[Y(1, m_1) - Y(0, m_0) \mid M = 1, M(0) = m_0, M(1) = m_1] \\ & \qquad \qquad \quad \cdot \mathbb{P}(M(0) = m_0, M(1) = m_1 \mid M = 1) \\ &= \sum_{m_0=0}^1 \sum_{m_1 = 0}^1 \mathbb{E}[Y(1, m_1) - Y(0, m_0)] \cdot \mathbb{P}(M(0) = m_0, M(1) = m_1 \mid M = 1). \end{align*}

The last two use equalities have used recursive substitution \(Y(d) = Y(d, M(d))\) and the assumption that \(D\) and \(M\) are randomized, so \(\{M,M(0),M(1)\} \perp \{Y(d,m) \mid d,m \in \{0,1\}\}\). Notice that the last independence assumption involves cross-world counterfactuals, so I am using Pearl’s NPSEM-IE interpretation of the causal diagram; see this paper by Richardson and Robins.

The last conditional probability in the above display equation can be simplified using Bayes’ formula:

\begin{align*} &\mathbb{P}(M(0) = m_0, M(1) = m_1 \mid M = 1) \\ =& \mathbb{P}(M = 1 \mid M(0) = m_0, M(1) = m_1) \cdot \frac{\mathbb{P}(M(0) = m_0, M(1) = m_1)}{\mathbb{P}(M=1)}. \\ \end{align*}

Using the law of total probability and \(D \perp \{M(0), M(1)\}\), we have

\begin{align*} &\mathbb{P}(M = 1 \mid M(0) = m_0, M(1) = m_1) \\ &= \sum_{d=0}^1 \mathbb{P}(M = 1 \mid M(0) = m_0, M(1) = m_1, D=d) \mathbb{P}(D = d) \\ &= \sum_{d=0}^1 1_{\{m_d=1\}} \cdot \mathbb{P}(D = d). \end{align*}

Combining the results above, we get

\begin{align*} &\text{ATE}_{M=1} \\ =& \frac{\sum_{m_0,m_1} \mathbb{E}[Y(1, m_1) - Y(0, m_0)] \cdot \mathbb{P}(M(0) = m_0, M(1) = m_1) \cdot \sum_{d=0}^1 1_{\{m_d=1\}} \cdot \mathbb{P}(D = d)}{\mathbb{P}(M=1)} \\ =& \mathbb{E}[Y(1, 1) - Y(0, 1)] \cdot \frac{\mathbb{P}(\text{always stop})}{\mathbb{P}(M=1)} \\ &+ \mathbb{E}[Y(1, 1) - Y(0, 0)] \cdot \frac{\mathbb{P}(\text{black-only stop}) \cdot \mathbb{P}(D = 1)}{\mathbb{P}(M=1)} \\ &+ \mathbb{E}[Y(1, 0) - Y(0, 1)] \cdot \frac{\mathbb{P}(\text{white-only stop}) \cdot \mathbb{P}(D = 0)}{\mathbb{P}(M=1)}. \end{align*}

So \(\text{ATE}_{M=1}\) is the weighted average of certain direct and indirect effects. In comparison, a straightforward application of the law of total expectation shows that

\begin{align*} \text{ATE} =& \mathbb{E}[Y(1, 1) - Y(0, 1)] \cdot \mathbb{P}(\text{always stop}) + \mathbb{E}[Y(1, 1) - Y(0, 0)] \cdot \mathbb{P}(\text{black-only stop}) \\ &+ \mathbb{E}[Y(1, 0) - Y(0, 1)] \cdot \mathbb{P}(\text{white-only stop}) + \mathbb{E}[Y(1, 0) - Y(0, 0)] \cdot \mathbb{P}(\text{never stop}). \end{align*}

## A Simpson’s paradox

To understand the difference between \(\text{ATE}\) and \(\text{ATE}_{M=1}\), it is helpful to assume that there is no use of force if there is no stop: \(Y(0,0) = Y(1,0) = 0\). This is called “mandatory reporting” in the Princeton paper. Moreover, let \(B = \mathbb{E}[Y(0,1)]\) be the baseline violence for whites and \(\text{CDE} = \mathbb{E}[Y(1, 1) - Y(0, 1)]\) be the controlled direct effect. Then

\begin{align*} \text{ATE}_{M=1} =& \text{CDE} \cdot \frac{\mathbb{P}(\text{always stop})}{\mathbb{P}(M=1)} + (B + \text{CDE}) \cdot \frac{\mathbb{P}(\text{black-only stop}) \cdot \mathbb{P}(D = 1)}{\mathbb{P}(M=1)} \\ & - B \cdot \frac{\mathbb{P}(\text{white-only stop}) \cdot \mathbb{P}(D = 0)}{\mathbb{P}(M=1)}, \end{align*}

and

\begin{align*} \text{ATE} =& \text{CDE} \cdot \mathbb{P}(\text{always stop}) + (B + \text{CDE}) \cdot \mathbb{P}(\text{black-only stop}) - B \cdot \mathbb{P}(\text{white-only stop}) \\ =& \text{CDE} \cdot [\mathbb{P}(\text{always stop}) + \mathbb{P}(\text{black-only stop})] \\ &+ B \cdot [\mathbb{P}(\text{black-only stop}) - \mathbb{P}(\text{white-only stop})]. \end{align*}

By comparing the above two equations, we see that an **unpleasant
property** of \(\text{ATE}_{M=1}\) is that it depends on the marginal
distribution of \(D\). This is actually quite important when we are
talking about bias towards minorities, so \(\mathbb{P}(D = 1) \ll
\mathbb{P}(D = 0)\). To understand this, it is easy to see that
\(\text{ATE}_{M=1} > 0\) is always true if there are both

- Racial bias in stopping: \(\mathbb{P}(\text{black-only stop}) > \mathbb{P}(\text{white-only stop})\), and
- Racial bias in use of force: \(\text{CDE} = \mathbb{E}[Y(1, 1) - Y(0, 1)] > 0\);

However, the same property is not true for \(\text{ATE}_{M=1}\). If

- The baseline violence for whites \(B = \mathbb{E}[Y(0,1)]\) is much higher than controlled direct effect \(\text{CDE}\), and
- The proportion of whites \(\mathbb{P}(D=0)\) is high enough such that

\begin{align*} \mathbb{P}(\text{black-only stop}) \cdot \mathbb{P}(D = 1) < \mathbb{P}(\text{white-only stop}) \cdot \mathbb{P}(D = 0), \end{align*}

it is indeed possible that \(\text{ATE}_{M=1} < 0\).

What we are observing here is a sophisticated version of Simpson’s paradox. Even though there is racial bias in every step of the policing, it may not be reflected in the estimand \(\text{ATE}_{M=1}\) because of the conditioning on \(M\). This issue does not occur for \(\text{ATE}\) because it is independent of the marginal distribution of \(D\).

## Conclusions

In summary, we need to be very careful when interpreting
\(\text{ATE}_{M=1}\) as a “causal” estimand. In certain circumstances,
\(\text{ATE}_{M=1}\) may not even have the same sign as the underlying direct
and indirect effects of race. When it is reasonable to assume
mediator monotonicity (one of the necessary assumptions in the
Princeton paper), that is \(\mathbb{P}(\text{white-only stop}) = 0\),
this Simpson’s paradox disappears. However, mediator monotonicity and
mandatory reporting also guarantee that the indirect effect is
positive. So mediator monotonicity is an ironic assumption in the
sense that it makes a study of racial bias biased *a priori*.

## —Update—

One may wonder what might happen if \(\text{ATT}_{M=1}\) is used. Using the Bayes formula (and treatment ignorability), we have \[ \mathbb{P}(M(0) = m_0, M(1) = m_1 \mid D = 1, M = 1) = 1_{\{m_1 = 1\}} \cdot \frac{\mathbb{P}(M(0) = m_0, M(1) = m_1)}{\mathbb{P}(M=1 \mid D=1)}. \] So following the same argument as above,

\begin{align*} \text{ATT}_{M=1} =& \mathbb{E}[Y(1) - Y(0) \mid D=1, M=1] \\ =& \frac{\mathbb{E}[Y(1,1) - Y(0,1)] \cdot \mathbb{P}(\text{always stop}) + \mathbb{E}[Y(1,1) - Y(0,0)] \cdot \mathbb{P}(\text{black-only stop})}{\mathbb{P}(M=1 \mid D=1)} \\ =& \frac{\text{CDE} \cdot \mathbb{P}(\text{always stop}) + (\text{CDE} + B) \cdot \mathbb{P}(\text{black-only stop})}{\mathbb{P}(M=1 \mid D=1)}. \end{align*}

This unfortunately still has a potential Simpson’s paradox (against whites). That is, even if \(\text{CDE} < 0\) and \(\text{P}(\text{black-only stop}) < \text{P}(\text{white-only stop})\) so the whites are discriminated against in both steps, it is still possible that \(\text{ATT}_{M=1} > 0\) (blacks appear to be discriminated against).