Abstract:Causal inferences from a randomized controlled trial (RCT) may not pertain to a target population where some effect modifiers have a different distribution. Prior work studies generalizing the results of a trial to a target population with no outcome but covariate data available. We show how the limited size of trials makes generalization a statistically infeasible task, as it requires estimating complex nuisance functions. We develop generalization algorithms that supplement the trial data with a prediction model learned from an additional observational study (OS), without making any assumptions on the OS. We theoretically and empirically show that our methods facilitate better generalization when the OS is high-quality, and remain robust when it is not, and e.g., have unmeasured confounding.

What problem does this paper attempt to address?

This paper mainly discusses how to generalize the results of a randomized controlled trial (RCT) to a target population, especially when the factors that affect the outcomes in the RCT are distributed differently in the target population. The paper points out that generalizing solely based on RCT data may be statistically infeasible because it requires estimating complex interference functions. To address this problem, the paper proposes a new method, which is to use additional observational study (OS) data to complement the RCT data, even if these observational data may be biased, to improve the accuracy of generalization and maintain robustness when the OS quality is high. The authors first point out the limitations of RCT in terms of time and cost, as well as its limited external validity and inability to directly apply to target populations with different characteristic distributions. They propose a generalization algorithm that combines RCT and potentially biased OS data to improve the estimation of causal effects. By using machine learning models for prediction, these methods can leverage additional data without relying on OS assumptions and reduce generalization error when the OS quality is high. The paper introduces several key concepts such as effect modifiers, potential outcomes, and confounding bias. The authors propose several assumptions such as consistency, ignorability of treatment assignment, and positivity to support causal inference. Then, they demonstrate how to estimate the average causal effect in the target population by combining RCT and OS data and using prediction models, even in the presence of unmeasured confounding factors in the OS. The paper also discusses how their method significantly improves estimation performance when compared to previous work, especially when the OS data quality is high, and validates this finding through extensive simulations of the data generation process. Finally, the paper proposes two new identification methods - additive bias correction and enhanced outcome modeling - to integrate prediction models for more statistically efficient generalization estimation. Overall, this paper aims to address how to more accurately generalize the causal inference of RCT to target populations with different characteristic distributions by combining RCT and OS data, thereby improving the accuracy and robustness of estimation.

Prediction-powered Generalization of Causal Inferences

Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals

Estimating individual treatment effect: generalization bounds and algorithms

Causal Inference and Counterfactual Prediction in Machine Learning for Actionable Healthcare

Testing Generalizability in Causal Inference

Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

Towards Generalizing Inferences from Trials to Target Populations

Generalizing and transporting causal inferences from randomized trials in the presence of trial engagement effects

Improving transportability of randomized controlled trial inference using robust prediction methods

Ensembled Prediction Intervals for Causal Outcomes Under Hidden Confounding

Generalization bounds and algorithms for estimating conditional average treatment effect of dosage

genRCT: a statistical analysis framework for generalizing RCT findings to real-world population

Predicting Counterfactuals from Large Historical Data and Small Randomized Trials

Efficient combination of observational and experimental datasets under general restrictions on outcome mean functions

Can predictive models be used for causal inference?

Study designs for extending causal inferences from a randomized trial to a target population

Automated, efficient and model-free inference for randomized clinical trials via data-driven covariate adjustment

Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning

Generalizing treatment effects with incomplete covariates: Identifying assumptions and multiple imputation algorithms

Precise unbiased estimation in randomized experiments using auxiliary observational data