Abstract:Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited. In this work, we build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data: subsampling randomized controlled trials (RCTs) to create confounded observational datasets while using the average causal effects from the RCTs as ground-truth. We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT. Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT -- which we release publicly -- consisting of approximately 70k observations and text data as high-dimensional covariates. Together, these contributions build towards a broader agenda of improved empirical evaluation for causal estimation.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper primarily aims to address the issue of unbiasedly estimating causal effects from observational data in the context of high-dimensional covariates (such as text data, genomics, or data in behavioral social sciences). Specifically: 1. **Problem Background**: - In modern scientific research, confounding factors are the main obstacles to unbiasedly estimating causal effects. - When the number of covariates is large (such as in natural language processing, genetics, or behavioral social sciences), traditional simple estimation strategies (such as parametric models or contingency tables) cannot meet the demand. 2. **Limitations of Existing Methods**: - Although various complex methods have been proposed in recent years to adjust for confounding factors, these methods have not yet undergone systematic and empirical benchmarking. - Supervised learning problems can usually evaluate predictive performance through true labels on a held-out test set, whereas causal inference problems require true labels of counterfactual outcomes, which are difficult to obtain in practice. 3. **Solution**: - A new sampling algorithm—RCT rejection sampling—is proposed, which can sample from randomized controlled trials (RCTs) to create observational datasets with confounding factors and ensure that causal effects are identifiable in the observational data. - Theoretically, it is proven that this algorithm addresses the issue of causal effects being unidentifiable that previous methods might lead to. - Synthetic data is used to verify that the new algorithm indeed reduces bias, whereas in some cases, previous methods cannot guarantee this. 4. **Specific Contributions**: - Provides a new RCT sampling algorithm and its theoretical guarantees. - Demonstrates that the algorithm performs better on synthetic data compared to previous methods. - Implements a proof-of-concept evaluation pipeline and publicly releases a real-world RCT dataset containing approximately 70,000 observations and text covariates. Overall, this paper aims to provide a more reliable framework for evaluating high-dimensional causal inference methods by improving RCT sampling strategies.

RCT Rejection Sampling for Causal Estimation Evaluation

Precise unbiased estimation in randomized experiments using auxiliary observational data

The covariate-adjusted residual estimator and its use in both randomized trials and observational settings

Adaptive Hybrid Control Design for Comparative Clinical Trials with Historical Control Data

Using Standard Tools from Finite Population Sampling to Improve Causal Inference for Complex Experiments

A Two-Step Framework for Validating Causal Effect Estimates

What can the millions of random treatments in nonexperimental data reveal about causes?

Matching Algorithms for Causal Inference with Multiple Treatments

Rejective Sampling, Rerandomization and Regression Adjustment in Survey Experiments

Causal Inference in Rebuilding and Extending the Recondite Bridge Between Finite Population Sampling and Experimental Design

Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation

An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference

Inference of Sample Complier Average Causal Effects in Completely Randomized Experiments

Leveraging Random Assignment to Impute Missing Covariates in Causal Studies

Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects

genRCT: a statistical analysis framework for generalizing RCT findings to real-world population

Causal Inference With Selectively Deconfounded Data

A robust and efficient approach to causal inference based on sparse sufficient dimension reduction

Design-Based RCT Estimators and Central Limit Theorems for Baseline Subgroup and Related Analyses

Improving transportability of randomized controlled trial inference using robust prediction methods