A New Covariate Selection Strategy for High Dimensional Data in Causal Effect Estimation with Multivariate Treatments

Juan Chen,Yingchun Zhou
DOI: https://doi.org/10.48550/arXiv.2303.09766
2023-03-17
Abstract:Selection of covariates is crucial in the estimation of average treatment effects given observational data with high or even ultra-high dimensional pretreatment variables. Existing methods for this problem typically assume sparse linear models for both outcome and univariate treatment, and cannot handle situations with ultra-high dimensional covariates. In this paper, we propose a new covariate selection strategy called double screening prior adaptive lasso (DSPAL) to select confounders and predictors of the outcome for multivariate treatments, which combines the adaptive lasso method with the marginal conditional (in)dependence prior information to select target covariates, in order to eliminate confounding bias and improve statistical efficiency. The distinctive features of our proposal are that it can be applied to high-dimensional or even ultra-high dimensional covariates for multivariate treatments, and can deal with the cases of both parametric and nonparametric outcome models, which makes it more robust compared to other methods. Our theoretical analyses show that the proposed procedure enjoys the sure screening property, the ranking consistency property and the variable selection consistency. Through a simulation study, we demonstrate that the proposed approach selects all confounders and predictors consistently and estimates the multivariate treatment effects with smaller bias and mean squared error compared to several alternatives under various scenarios. In real data analysis, the method is applied to estimate the causal effect of a three-dimensional continuous environmental treatment on cholesterol level and enlightening results are obtained.
Methodology
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the covariate selection problem when estimating the causal effects of multivariate treatments in the case of high - dimensional or ultra - high - dimensional covariates (pretreatment variables). Specifically, the paper proposes a new covariate selection strategy - **Double Screening Prior Adaptive Lasso (DSPAL)** - to select confounding factors and outcome predictors, thereby eliminating confounding bias and improving statistical efficiency. #### Background and Challenges 1. **Challenge of High - Dimensional Data**: Existing methods usually assume sparse linear models of outcomes and univariate treatments and are unable to handle the case of ultra - high - dimensional covariates. 2. **Multivariate Treatments**: Most existing methods are for univariate treatments, while this paper focuses on multivariate continuous treatments. 3. **Impact of Confounding Factors**: Confounding factors in observational data can lead to bias in the estimation of causal effects, so effective methods are needed to identify and control these confounding factors. #### Solutions The DSPAL method proposed in the paper has the following characteristics: - **Wide Applicability**: It can be applied to multivariate treatments with high - dimensional or ultra - high - dimensional covariates. - **Model Flexibility**: It can handle both parametric and non - parametric outcome models, making it more robust. - **Theoretical Guarantee**: Through theoretical analysis, it has been proven that this method has the sure screening property, the ranking consistency property, and the variable selection consistency. #### Method Overview 1. **Independent Screening**: Screen out all confounding factors and instrumental variables based on multi - treatment canonical correlation statistics. 2. **Conditional Independent Screening**: Further screen out a set of covariates including all confounding factors and outcome predictors based on generalized covariance measure statistics. 3. **Prior Adaptive Lasso**: Combine marginal conditional (in)dependent prior information and use the adaptive lasso method to exclude potentially remaining spurious variables and instrumental variables. 4. **Causal Effect Estimation**: Based on the selected covariates, use the entropy - balancing method to estimate the causal effects of multivariate continuous treatments. #### Experimental Verification Through simulation studies, the paper shows the consistency and superiority of the DSPAL method in different scenarios, especially excellent performance in selecting all confounding factors and predictors, and the estimated multivariate treatment effects have small bias and mean - squared error. ### Summary This paper proposes a novel and effective covariate selection strategy, solves the problem of estimating the causal effects of multivariate treatments in the case of high - dimensional or ultra - high - dimensional covariates, and has important theoretical and application values.