Feature screening and variable selection for partially linear models with ultrahigh-dimensional longitudinal data

Jingyuan Liu
DOI: https://doi.org/10.1016/j.neucom.2015.09.122
IF: 6
2016-06-01
Neurocomputing
Abstract:This paper is concerned with longitudinal partially linear models (LPLM) with ultrahigh-dimensional covariates and predictors. As flexible extension of linear regression models by allowing nonparametric intercept function to capture the overall trend over time, the LPLM are expected to be highly potential statistical models for analyzing high-dimensional longitudinal data such as longitudinal genetic data and functional magnetic resonance image data. Feature screening and variable selection are indispensable for LPLM in the presence of ultrahigh-dimensional covariates such as genetic markers and all pixels in image data. This paper proposes a two-stage variable selection procedure that consists of a quick screening stage and a post-screening refining stage, for the ultrahigh dimensional longitudinal partially linear models. The proposed approach is based on the partial residual method for dealing with the nonparametric baseline function. We establish the sure screening property of the proposed screening procedure in the first stage. Simulation results demonstrate the validity of this two-stage method. We further demonstrate the proposed methodology by an empirical analysis of a real data set collected in a soybean plant longitudinal genetic study.
computer science, artificial intelligence
What problem does this paper attempt to address?