Abstract:Researchers in many fields endeavor to estimate treatment effects by regressing outcome data (Y) on a treatment (D) and observed confounders (X). Even absent unobserved confounding, the regression coefficient on the treatment reports a weighted average of strata-specific treatment effects (Angrist, 1998). Where heterogeneous treatment effects cannot be ruled out, the resulting coefficient is thus not generally equal to the average treatment effect (ATE), and is unlikely to be the quantity of direct scientific or policy interest. The difference between the coefficient and the ATE has led researchers to propose various interpretational, bounding, and diagnostic aids (Humphreys, 2009; Aronow and Samii, 2016; Sloczynski, 2022; Chattopadhyay and Zubizarreta, 2023). We note that the linear regression of Y on D and X can be misspecified when the treatment effect is heterogeneous in X. The "weights of regression", for which we provide a new (more general) expression, simply characterize how the OLS coefficient will depart from the ATE under the misspecification resulting from unmodeled treatment effect heterogeneity. Consequently, a natural alternative to suffering these weights is to address the misspecification that gives rise to them. For investigators committed to linear approaches, we propose relying on the slightly weaker assumption that the potential outcomes are linear in X. Numerous well-known estimators are unbiased for the ATE under this assumption, namely regression-imputation/g-computation/T-learner, regression with an interaction of the treatment and covariates (Lin, 2013), and balancing weights. Any of these approaches avoid the apparent weighting problem of the misspecified linear regression, at an efficiency cost that will be small when there are few covariates relative to sample size. We demonstrate these lessons using simulations in observational and experimental settings.

On the implied weights of linear regression for causal inference

Potential weights and implicit causal designs in linear regression

The covariate-adjusted residual estimator and its use in both randomized trials and observational settings

Optimal transport weights for causal inference

Understanding and avoiding the "weights of regression": Heterogeneous effects, misspecification, and longstanding solutions

Independence weights for causal inference with continuous treatments

Towards Representation Learning for Weighting Problems in Design-Based Causal Inference

Balancing Weights for Causal Inference in Observational Factorial Studies

Minimal dispersion approximately balancing weights: asymptotic properties and practical considerations

Doubly Robust Inference in Causal Latent Factor Models

Augmented balancing weights as linear regression

Causal inference under over-simplified longitudinal causal models

Design-Robust Two-Way-Fixed-Effects Regression For Panel Data

Variance Reduction for Causal Inference

On Estimating Regression-Based Causal Effects Using Sufficient Dimension Reduction

Causal Inference with High-dimensional Discrete Covariates

On Flexible Inverse Probability of Treatment and Intensity Weighting: Informative Censoring, Variable Inclusion, and Weight Trimming

Investigating weight constraint methods for causal-formative indicator modeling

On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data

Robust weights that optimally balance confounders for estimating marginal hazard ratios

The causal interpretation of estimated associations in regression models