Sure Independence Screening Adjusted for Confounding Covariates with Ultrahigh-dimensional Data

Xueqin Wang,Canhong Wen,Wenliang Pan,Mian Huang
DOI: https://doi.org/10.5705/ss.202014.0117
IF: 1.4
2018-01-01
Statistica Sinica
Abstract:Detecting candidate genetic variants in genomic studies often encounters confounding problems, particularly when the data are ultrahigh dimensional. Confounding covariates, such as age and gender, not only can reduce the statistical power, but also introduce spurious genetic association. How to control for the confounders in ultrahigh dimensional data analysis is a critical and challenging issue. In this paper, we propose a novel sure independence screening method based on conditional distance correlation under the ultrahigh dimensional model setting. Our proposal accomplishes the adjustment by conditioning on the confounding variables. With the model-free feature of conditional distance correlation, our method does not need any parametric modeling assumptions and is thus quite flexible. In addition, it is applicable to data with multivariate response. We show that under some mild technical conditions, the proposed method enjoys the sure screening property even when the dimensionality is an exponential order of the sample size. The simulation studies and a data analysis demonstrate that the proposed procedure has competitive performance.
What problem does this paper attempt to address?