Causal-StoNet: Causal Inference for High-Dimensional Complex Data

Yaxin Fang,Faming Liang
2024-03-28
Abstract:With the advancement of data science, the collection of increasingly complex datasets has become commonplace. In such datasets, the data dimension can be extremely high, and the underlying data generation process can be unknown and highly nonlinear. As a result, the task of making causal inference with high-dimensional complex data has become a fundamental problem in many disciplines, such as medicine, econometrics, and social science. However, the existing methods for causal inference are frequently developed under the assumption that the data dimension is low or that the underlying data generation process is linear or approximately linear. To address these challenges, this paper proposes a novel causal inference approach for dealing with high-dimensional complex data. The proposed approach is based on deep learning techniques, including sparse deep learning theory and stochastic neural networks, that have been developed in recent literature. By using these techniques, the proposed approach can address both the high dimensionality and unknown data generation process in a coherent way. Furthermore, the proposed approach can also be used when missing values are present in the datasets. Extensive numerical studies indicate that the proposed approach outperforms existing ones.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the causal inference problem in high - dimensional complex data. Specifically, with the progress of data science, the collected data sets become more and more complex, with extremely high data dimensions and the generation process may be unknown and highly nonlinear. This has made causal inference on high - dimensional complex data a fundamental problem in many disciplines such as medicine, econometrics, and social sciences. However, existing causal inference methods usually assume that the data dimension is low or the data generation process is linear or approximately linear, and these assumptions often do not hold when dealing with modern high - dimensional complex data. To meet these challenges, this paper proposes a new causal inference method based on deep learning technology - Causal - StoNet (Causal Stochastic Neural Network). This method combines sparse deep learning theory and stochastic neural network technology, and can handle the problems of high - dimensionality and unknown data generation processes simultaneously, and can also be effectively applied when there are missing values in the data set. Through extensive numerical studies, the results show that this method is superior to existing methods. The main contributions of Causal - StoNet are: 1. **Natural forward modeling framework**: By replacing the hidden neurons in a hidden layer with visible treatment variables, Causal - StoNet provides a natural forward modeling framework, which is suitable for complex generation processes. 2. **Universal approximation ability**: It is proved that Causal - StoNet has the same universal approximation ability as deep neural networks, and can approximate the outcome function and the propensity score function. 3. **Consistent sparse learning**: By imposing appropriate sparse penalties / priors on the structure of Causal - StoNet, variables related to the outcome and propensity score can be identified even in high - dimensional covariate settings, so as to correctly estimate the outcome and propensity score. In conclusion, Causal - StoNet successfully solves the problems of high - dimensional covariates, unknown function forms, and missing data in an overall manner, and provides a robust and reliable method for causal inference in high - dimensional complex data.