Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation

Jikai Jin,Vasilis Syrgkanis
2024-03-02
Abstract:Average treatment effect estimation is the most central problem in causal inference with application to numerous disciplines. While many estimation strategies have been proposed in the literature, the statistical optimality of these methods has still remained an open area of investigation, especially in regimes where these methods do not achieve parametric rates. In this paper, we adopt the recently introduced structure-agnostic framework of statistical lower bounds, which poses no structural properties on the nuisance functions other than access to black-box estimators that achieve some statistical estimation rate. This framework is particularly appealing when one is only willing to consider estimation strategies that use non-parametric regression and classification oracles as black-box sub-processes. Within this framework, we prove the statistical optimality of the celebrated and widely used doubly robust estimators for both the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT), as well as weighted variants of the former, which arise in policy evaluation.
Machine Learning,Econometrics,Statistics Theory,Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the statistical optimality problem of the average treatment effect (ATE) and the average treatment effect for the treated group (ATT) estimation in causal inference. Specifically, the author focuses on how to prove the statistical optimality of the well - known doubly robust estimator for ATE and ATT and their weighted variants (such as the weighted average treatment effect WATE in policy evaluation) when using non - parametric regression and classification as black - box processes in the structure - agnostic framework. ### Background and Problem Description of the Paper 1. **Importance of Causal Inference**: - The average treatment effect (ATE) and the average treatment effect for the treated group (ATT) are among the most central problems in causal inference and are widely used in fields such as economics, education, epidemiology, and political science. - Many estimation strategies have been proposed, but these methods cannot reach the parametric rate in some cases, and their statistical optimality remains an open research area. 2. **Structure - Agnostic Framework**: - The author adopts the recently introduced structure - agnostic framework, which makes no structural assumptions about the nuisance functions and only requires access to black - box estimators that reach a certain statistical estimation rate. - This framework is particularly suitable for estimation strategies that only consider using non - parametric regression and classification as black - box sub - processes. 3. **Target Problem**: - The author's goal is to derive the statistical optimal estimation rates of ATE and ATT, and also study the statistical optimality of the weighted average treatment effect (WATE). - Specifically, they prove the statistical optimality of the doubly robust estimator in the structure - agnostic framework. ### Main Contributions 1. **Proof of Statistical Optimality**: - The author proves that the doubly robust estimator reaches the optimal statistical estimation rate in the structure - agnostic framework. - They derive the lower bounds of ATE, ATT, and WATE and prove that these lower bounds can be matched by the doubly robust estimator, thus indicating that these estimators are statistically optimal in the structure - agnostic framework. 2. **Technical Contributions**: - The author uses the method of fuzzy hypotheses to establish lower bounds, which is achieved by constructing a mixed hypothesis testing problem. - They design complex hypothesis constructions, especially based on asymmetric perturbations in the nuisance function space, which is challenging when dealing with non - inner - product - like functional relationships such as ATE and ATT. ### Conclusion Through strict mathematical derivations and hypothesis constructions, this paper proves that in the structure - agnostic framework, the doubly robust estimator achieves the optimal statistical performance when estimating ATE, ATT, and WATE. This result not only extends the theoretical framework in the existing literature but also provides a solid theoretical basis for causal inference in practical applications.