A Model-averaging Method for High-Dimensional Regression with Missing Responses at Random

Jinhan Xie,Xiaodong Yan,Niansheng Tang
DOI: https://doi.org/10.5705/ss.202018.0297
IF: 1.4
2019-01-01
Statistica Sinica
Abstract:This study considers the ultrahigh-dimensional prediction problem in the presence of responses missing at random. A two-step model-averaging procedure is proposed to improve the prediction accuracy of the conditional mean of the response variable. The first step specifies several candidate models, each with low-dimensional predictors. To implement this step, a new feature-screening method is developed to distinguish between the active and inactive predictors. The method uses the multiple-imputation sure independence screening (MI-SIS) procedure, and candidate models are formed by grouping covariates with similar size MI-SIS values. The second step develops a new criterion to find the optimal weights for averaging a set of candidate models using weighted delete-one cross-validation (WDCV). Under some regularity conditions, we show that the proposed screening statistic enjoys the ranking consistency property, and that the WDCV criterion asymptotically achieves the lowest possible prediction loss. Simulation studies and an example demonstrate the proposed methodology.
What problem does this paper attempt to address?