Factor Profiling for Ultra High Dimensional Variable Selection

Hansheng Wang
DOI: https://doi.org/10.2139/ssrn.1613452
2010-01-01
SSRN Electronic Journal
Abstract:We propose here a novel method of factor profiling (FP) for ultra high dimensional variable selection. The new method assumes that the correlation structure of the high dimensional data can be well represented by a set of low-dimensional latent factors (Fan et al., 2008). The latent factors can then be estimated consistently by eigenvalue-eigenvector decomposition. They should be profiled out subsequently from both the response and predictors. Such an operation is referred to as FP. Obviously, FP produces uncorrelated predictors. Thereafter, the method of sure independent screening (Fan and Lv, 2008, SIS) can be applied immediately. This leads to profiled independent screening (PIS). PIS is shown to be selection consistent, even if the predictor dimension is substantially larger than the sample size. To further improve PIS, a novel method of profiled sequential screening (PSS) is proposed. PSS shares similar strength as forward regression (Wang, 2009a) but is computationally even simpler. Numerical studies are presented to corroborate our theoretical findings.
What problem does this paper attempt to address?