Abstract:We develop a novel approach to partially identify causal estimands, such as the average treatment effect (ATE), from observational data. To better satisfy the stable unit treatment value assumption (SUTVA) we utilize stochastic counterfactuals within a propensity-prognosis model of the data generating process. For more precise identification we utilize knowledge of discordant twin outcomes as evidence for randomness in the data generating process. Our approach culminates with a constrained optimization problem; the solution gives upper and lower bounds for the ATE. We demonstrate the applicability of our introduced methodology with three example applications.

What problem does this paper attempt to address?

This paper attempts to solve the problem of partially identifying the Average Treatment Effect (ATE) from observational data. Specifically, the author has developed a new method to partially identify causal estimators such as ATE from observational data. To better meet the Stable Unit Treatment Value Assumption (SUTVA), this method utilizes a data - generating process model constructed from propensity probabilities and prognosis probabilities and introduces stochastic counterfactuals. In addition, by using the results of discordant twins as evidence of randomness in the data - generating process, the identification accuracy is improved. ### Main contributions 1. **Derived an inequality**: This inequality links the consistency measure to the variance of the propensity probability (see Proposition 3.1). 2. **Proposed a constrained optimization problem**: This problem is used to partially identify ATE (see Equation (4)). 3. **Described a method for approximately solving the optimization problem**: Approximately solve the above - mentioned optimization problem through linear programming. ### Method overview - **Propensity - prognosis model**: This model assumes that the exposure (or treatment) and outcome (or disease) of each individual can be described by propensity probabilities and prognosis probabilities. The specific forms are as follows: - \( e_i \sim \text{Bernoulli}(\pi_i) \) - \( d_i(e_i = 0) \sim \text{Bernoulli}(r_{0i}) \) - \( d_i(e_i = 1) \sim \text{Bernoulli}(r_{1i}) \) - **Twin study**: Use the results of discordant twins to estimate the randomness in the data - generating process. It is assumed that the existence of discordant twins indicates that the exposure process is more likely to be random. - **Constrained optimization problem**: Partially identify ATE by solving a constrained optimization problem. The form of the optimization problem is as follows: \[ \text{ATE}_{\text{min/max}}=\min /\max_{m \in P(C)} \int_C(r_1 - r_0) \, dm \] where the constraints include: - \(\int_C(1 - \pi) r_0 \, dm = P(e = 0, d = 1)\) - \(\int_C \pi r_1 \, dm = P(e = 1, d = 1)\) - \(\int_C(1 - \pi)(1 - r_0) \, dm = P(e = 0, d = 0)\) - \(\int_C \pi(1 - r_1) \, dm = P(e = 1, d = 0)\) - \(\int_C(\pi - P(e = 1))^2 \, dm \leq P(e = 1)(BC_e - P(e = 1))\) - \(\int_C(r - P(d = 1))^2 \, dm \leq P(d = 1)(BC_d - P(d = 1))\) ### Application examples The paper demonstrates the effectiveness of this method through three practical applications: 1. **Whether diabetes causes stroke**: Through the data in Table 1, the confidence interval of ATE is calculated as \([- 0.01,0.48]\), failing to prove that ATE is positive. 2. **Whether smoking causes chronic obstructive pulmonary disease (COPD)**: Through the data in Table 2, the confidence interval of ATE is calculated as \([0.03,0.21]\), proving that ATE is positive. 3. **Whether marijuana use causes hard drug use**: Through the data in Table 3, the confidence interval of ATE is calculated as \([0.11,\)

Partial Identification of the Average Treatment Effect with Stochastic Counterfactuals and Discordant Twins

Robust nonparametric estimation of average treatment effects: A propensity score‐based varying coefficient approach

The alpha-synucleinopathies: Parkinson's disease, dementia with Lewy bodies, and multiple system atrophy.

Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables

A nonparametric super-efficient estimator of the average treatment effect

Double Robust Bayesian Inference on Average Treatment Effects

Estimating individual treatment effect: generalization bounds and algorithms

High Dimensional Propensity Score Estimation via Covariate Balancing

Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

Robust and Efficient Semi-Supervised Estimation of Average Treatment Effects with Application to Electronic Health Records Data

A Differential Effect Approach to Partial Identification of Treatment Effects

Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects

Doubly Robust Estimation in Causal Inference with Missing Outcomes: with an Application to the Aerobics Center Longitudinal Study

Undersmoothing Causal Estimators With Generative Trees

Estimation of Average Treatment Effect Based on a Multi-Index Propensity Score.

Selective Machine Learning of the Average Treatment Effect with an Invalid Instrumental Variable

Sparsity Double Robust Inference of Average Treatment Effects

Interval Estimation of Individual-Level Causal Effects under Unobserved Confounding

The Informativeness of Combined Experimental and Observational Data under Dynamic Selection

Stochastic Intervention for Causal Effect Estimation

Identification of the Heterogeneous Survivor Average Causal Effect in Observational Studies