Partial Identification of the Average Treatment Effect with Stochastic Counterfactuals and Discordant Twins

Brian Knaeble,Braxton Osting,Placede Tshiaba
2024-07-27
Abstract:We develop a novel approach to partially identify causal estimands, such as the average treatment effect (ATE), from observational data. To better satisfy the stable unit treatment value assumption (SUTVA) we utilize stochastic counterfactuals within a propensity-prognosis model of the data generating process. For more precise identification we utilize knowledge of discordant twin outcomes as evidence for randomness in the data generating process. Our approach culminates with a constrained optimization problem; the solution gives upper and lower bounds for the ATE. We demonstrate the applicability of our introduced methodology with three example applications.
Methodology,Optimization and Control
What problem does this paper attempt to address?
This paper attempts to solve the problem of partially identifying the Average Treatment Effect (ATE) from observational data. Specifically, the author has developed a new method to partially identify causal estimators such as ATE from observational data. To better meet the Stable Unit Treatment Value Assumption (SUTVA), this method utilizes a data - generating process model constructed from propensity probabilities and prognosis probabilities and introduces stochastic counterfactuals. In addition, by using the results of discordant twins as evidence of randomness in the data - generating process, the identification accuracy is improved. ### Main contributions 1. **Derived an inequality**: This inequality links the consistency measure to the variance of the propensity probability (see Proposition 3.1). 2. **Proposed a constrained optimization problem**: This problem is used to partially identify ATE (see Equation (4)). 3. **Described a method for approximately solving the optimization problem**: Approximately solve the above - mentioned optimization problem through linear programming. ### Method overview - **Propensity - prognosis model**: This model assumes that the exposure (or treatment) and outcome (or disease) of each individual can be described by propensity probabilities and prognosis probabilities. The specific forms are as follows: - \( e_i \sim \text{Bernoulli}(\pi_i) \) - \( d_i(e_i = 0) \sim \text{Bernoulli}(r_{0i}) \) - \( d_i(e_i = 1) \sim \text{Bernoulli}(r_{1i}) \) - **Twin study**: Use the results of discordant twins to estimate the randomness in the data - generating process. It is assumed that the existence of discordant twins indicates that the exposure process is more likely to be random. - **Constrained optimization problem**: Partially identify ATE by solving a constrained optimization problem. The form of the optimization problem is as follows: \[ \text{ATE}_{\text{min/max}}=\min /\max_{m \in P(C)} \int_C(r_1 - r_0) \, dm \] where the constraints include: - \(\int_C(1 - \pi) r_0 \, dm = P(e = 0, d = 1)\) - \(\int_C \pi r_1 \, dm = P(e = 1, d = 1)\) - \(\int_C(1 - \pi)(1 - r_0) \, dm = P(e = 0, d = 0)\) - \(\int_C \pi(1 - r_1) \, dm = P(e = 1, d = 0)\) - \(\int_C(\pi - P(e = 1))^2 \, dm \leq P(e = 1)(BC_e - P(e = 1))\) - \(\int_C(r - P(d = 1))^2 \, dm \leq P(d = 1)(BC_d - P(d = 1))\) ### Application examples The paper demonstrates the effectiveness of this method through three practical applications: 1. **Whether diabetes causes stroke**: Through the data in Table 1, the confidence interval of ATE is calculated as \([- 0.01,0.48]\), failing to prove that ATE is positive. 2. **Whether smoking causes chronic obstructive pulmonary disease (COPD)**: Through the data in Table 2, the confidence interval of ATE is calculated as \([0.03,0.21]\), proving that ATE is positive. 3. **Whether marijuana use causes hard drug use**: Through the data in Table 3, the confidence interval of ATE is calculated as \([0.11,\)