Causality and Statistical Learning

Andrew Gelman
DOI: https://doi.org/10.48550/arXiv.1003.2619
2010-03-13
Abstract:We review some approaches and philosophies of causal inference coming from sociology, economics, computer science, cognitive science, and statistics
Statistics Theory,Methodology
What problem does this paper attempt to address?
This paper attempts to explore the application of causal inference in social sciences and the challenges it faces. Specifically, the author Andrew Gelman divides causal inference into two broad categories of problems: 1. **Forward Causal Inference**: that is, "What will happen if X is done?" For example, the impact of smoking on health, the impact of education on knowledge, the impact of election campaigns on election results, etc. Such problems are usually studied through random experiments or so - called natural experiments to estimate the effects of specific interventions. 2. **Reverse Causal Inference**: that is, "What are the causes of Y?" For example, why do more attractive people earn more money? Why do many poor people vote for the Republican Party while the rich vote for the Democratic Party? What are the causes of economic collapse? Such problems are more complex because causal relationships are usually multi - faceted and difficult to clearly trace back. The paper further discusses three main problems in causal inference: - **Forward Causal Inference Using Observational Data or Experiments with Missing Data**: This is the traditional focus in the statistics and biostatistics literature, because missing data is an inherent problem in the counterfactual definition. - **Generalizing from Experiments or Quasi - Experiments to Real - World Scenarios**: This is a problem of external validity, which was emphasized by Heckman (2006). Experiments are often not a good way to understand how effects change in different situations. - **Studying Reverse Causal Relationships Using Multivariate Analysis from Observational Data**: There are many factors here that threaten the validity of the model, making it difficult to build confidence in any estimates. The author also discusses different scholars' attitudes towards causal inference, from the conservative statistical view (such as Heckman's view) to the more liberal epidemiological view (such as Greenland and Robins' view), as well as social psychologists and supporters of structural equation models. Finally, the paper emphasizes that the causal structure cannot be learned solely from observational data and needs to be combined with experiments or hypothetical experiments to verify hypotheses. The author evaluates Pearl's causal graph model and Rubin's potential outcome framework and presents his own contradictory views on causal inference, believing that it is almost impossible to draw strong causal inferences from observational data without strong theoretical support. In short, this paper aims to explore the complexity and diversity of causal inference in social sciences and how to solve these problems through different methods and techniques.