A Lasso-OLS Hybrid Approach to Covariate Selection and Average Treatment Effect Estimation for Clustered RCTs Using Design-Based Methods

Peter Z. Schochet
DOI: https://doi.org/10.48550/arXiv.2005.02502
2020-05-05
Methodology
Abstract:Statistical power is often a concern for clustered RCTs due to variance inflation from design effects and the high cost of adding study clusters (such as hospitals, schools, or communities). While covariate pre-specification is the preferred approach for improving power to estimate regression-adjusted average treatment effects (ATEs), further precision gains can be achieved through covariate selection once primary outcomes have been collected. This article uses design-based methods underlying clustered RCTs to develop a Lasso-OLS hybrid procedure for the post-hoc selection of covariates and ATE estimation that avoids model overfitting and lack of transparency. In the first stage, lasso estimation is conducted using cluster-level averages, where asymptotic normality is proved using a new central limit theorem for finite population regression estimators. In the second stage, ATEs and design-based standard errors are estimated using weighted least squares with the first stage lasso covariates. This nonparametric approach applies to continuous, binary, and discrete outcomes. Simulation results indicate that Type 1 errors of the second stage ATE estimates are near nominal values and standard errors are near true ones, although somewhat conservative with small samples. The method is demonstrated using data from a large, federally funded clustered RCT testing the effects of school-based programs promoting behavioral health.
What problem does this paper attempt to address?