Ensembled Prediction Intervals for Causal Outcomes Under Hidden Confounding

Myrl G. Marmarelis,Greg Ver Steeg,Aram Galstyan,Fred Morstatter
2023-11-01
Abstract:Causal inference of exact individual treatment outcomes in the presence of hidden confounders is rarely possible. Recent work has extended prediction intervals with finite-sample guarantees to partially identifiable causal outcomes, by means of a sensitivity model for hidden confounding. In deep learning, predictors can exploit their inductive biases for better generalization out of sample. We argue that the structure inherent to a deep ensemble should inform a tighter partial identification of the causal outcomes that they predict. We therefore introduce an approach termed Caus-Modens, for characterizing causal outcome intervals by modulated ensembles. We present a simple approach to partial identification using existing causal sensitivity models and show empirically that Caus-Modens gives tighter outcome intervals, as measured by the necessary interval size to achieve sufficient coverage. The last of our three diverse benchmarks is a novel usage of GPT-4 for observational experiments with unknown but probeable ground truth.
Machine Learning,Artificial Intelligence,Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to make more accurate prediction interval estimates of causal results in the presence of hidden confounding factors. Specifically, the paper focuses on how to improve the prediction intervals of partially identifiable causal results through the ensemble prediction method in causal inference when not all confounding factors can be fully observed. The paper proposes a new method - Caus - Modens, which optimizes the prediction intervals of causal results by modulating the ensemble model, thereby providing more tight uncertainty estimates when partially identifying causal effects. ### Background and Motivation of the Paper 1. **Challenges in Causal Inference**: - One of the core problems in causal inference is dealing with the influence of confounding factors, especially those hidden confounding factors that are not observed. - The existence of hidden confounding factors makes it difficult to accurately estimate the individual treatment effect because these factors may affect the relationship between treatment assignment and outcome. 2. **Limitations of Existing Methods**: - Existing methods, such as sensitivity analysis, can deal with the influence of hidden confounding factors to a certain extent, but usually lead to wider prediction intervals, reducing the practicality of the results. - Although conformalized causal sensitivity analysis can provide finite - sample guarantees, its performance is limited when dealing with highly biased data. 3. **Innovations of the Paper**: - The paper introduces the Caus - Modens method, which uses the inductive bias of the deep ensemble model to improve the prediction intervals of partially identifiable causal results. - By re - weighting the predictors in the ensemble model, Caus - Modens can generate more tight prediction intervals while maintaining partial identifiability. ### Method Overview 1. **Basic Assumptions**: - **Potential Outcome Assumption**: Assume that the observed data \((y(i), t(i), x(i))\) are independently and identically distributed (i.i.d.) from a joint distribution. - **Non - zero Treatment Probability**: Each individual has a non - zero probability of receiving any treatment. - **Hidden Confounding Factors**: Allow the potential outcomes \((Y_t)\) and treatment \(T\) to be not independent given the covariates \(X\), that is, there are hidden confounding factors. 2. **Sensitivity Model**: - Use the sensitivity model to quantify the influence of hidden confounding factors. For example, for binary treatment, use the Marginal Sensitivity Model (MSM) to define the ratio range between the nominal propensity score and the complete propensity score. 3. **Caus - Modens Method**: - Optimize the upper and lower bounds of the prediction interval by re - weighting the predictors in the ensemble model. - Specifically, for a given individual and treatment, generate more tight prediction intervals of causal results by adjusting the weights \(\omega(θ, t, x)\) to minimize or maximize the conditional quantiles. ### Experimental Verification 1. **Benchmark Tests**: - The paper conducted experiments on three different benchmark datasets, including the classic IHDP dataset, the new PBMC dataset, and the observational experiment using GPT - 4. - By comparing Caus - Modens with existing conformalized causal sensitivity analysis methods (such as Ens - CSA - DCP, Ens - CSA - CQR, etc.), the superiority of Caus - Modens in generating more tight prediction intervals was verified. 2. **Evaluation Metrics**: - Use coverage efficiency as the main evaluation metric, that is, the size of the prediction interval under the premise of achieving the target coverage rate. - The results show that Caus - Modens can generate smaller prediction intervals on multiple datasets, especially when dealing with highly biased data. ### Conclusion The method Caus - Modens proposed in the paper effectively improves the prediction intervals of causal results in the presence of hidden confounding factors by modulating the ensemble model. The experimental results show that Caus