The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies

Anna Baiardi,Andrea A Naghi
DOI: https://doi.org/10.1093/ectj/utae004
2024-02-06
Abstract:Abstract A new and rapidly growing econometric literature is making advances in the problem of using machine learning methods for causal inference questions. Yet, the empirical economics literature has not started to fully exploit the strengths of these modern methods. We revisit influential empirical studies with causal machine learning methods aiming to connect the econometric theory on these methods with empirical economics. We focus on the double machine learning, causal forest and generic machine learning methods, in the context of both average and heterogeneous treatment effects. We illustrate the implementation of these methods in a variety of settings and highlight the relevance and value added relative to traditional methods used in the original studies.
economics,social sciences, mathematical methods,mathematics, interdisciplinary applications,statistics & probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use machine - learning methods to provide more value in causal inference. Specifically, the authors re - examined several influential empirical studies and applied causal machine - learning methods to evaluate the advantages of these methods over traditional methods. They focused on Average Treatment Effects (ATE) and Heterogeneous Treatment Effects (HTE), and used Double Machine Learning (DML), Causal Forest and Generic Machine Learning methods. ### Main objectives of the paper: 1. **Demonstrate the advantages of causal machine - learning methods in practical settings**: By re - analyzing existing empirical studies, the authors demonstrated the advantages of causal machine - learning methods in handling complex interactions, high - dimensional data, systematic model selection, and estimating heterogeneous treatment effects. 2. **Compare the performance of causal machine - learning methods with that of traditional methods**: The authors evaluated the performance of causal machine - learning methods relative to traditional methods under different data - generation processes through Monte Carlo simulation experiments. 3. **Provide practical advice**: Based on the results of the re - analysis, the authors provided suggestions and precautions for applied researchers on using causal machine - learning methods. ### Specific problems: - **Average Treatment Effects (ATE)**: The authors re - analyzed two observational studies, namely Djankov et al. (2010a) on the impact of corporate tax on investment and entrepreneurship, and Nunn and Trefler (2010a) on the impact of skill - biased tariffs on long - term economic growth. - **Heterogeneous Treatment Effects (HTE)**: The authors selected two studies, one by DellaVigna and Kaplan (2007a) on the impact of Fox News on the Republican vote share, and the other by Loyalka et al. (2019a) on the impact of teacher - training interventions on student performance. ### Main findings: 1. **Handling complex interactions**: Causal machine - learning methods can more flexibly estimate the relationships between outcome variables, treatment variables, and covariates, thereby reducing the bias caused by omitted variables. 2. **Handling high - dimensional data**: When the number of covariates is large relative to the sample size, causal machine - learning methods assume that the model is sparse and use regularized regression, thereby improving the estimation accuracy. 3. **Systematic model selection**: Many machine - learning methods select the best functional form by estimating and comparing multiple model specifications. Model selection is data - driven and fully documented. 4. **Estimating heterogeneous treatment effects**: Causal machine - learning methods can systematically handle multiple covariates that may lead to treatment - effect heterogeneity, thereby reducing the risk of omitting important heterogeneous effects. Through these analyses, the authors demonstrated the potential advantages of causal machine - learning methods in empirical research, especially in handling complex data and improving estimation accuracy.