A Model-Averaging Approach for High-Dimensional Regression

Tomohiro Ando,Ker-Chau Li
DOI: https://doi.org/10.1080/01621459.2013.838168
IF: 4.369
2014-01-02
Journal of the American Statistical Association
Abstract:This article considers high-dimensional regression problems in which the number of predictors p exceeds the sample size n. We develop a model-averaging procedure for high-dimensional regression problems. Unlike most variable selection studies featuring the identification of true predictors, our focus here is on the prediction accuracy for the true conditional mean of y given the p predictors. Our method consists of two steps. The first step is to construct a class of regression models, each with a smaller number of regressors, to avoid the degeneracy of the information matrix. The second step is to find suitable model weights for averaging. To minimize the prediction error, we estimate the model weights using a delete-one cross-validation procedure. Departing from the literature of model averaging that requires the weights always sum to one, an important improvement we introduce is to remove this constraint. We derive some theoretical results to justify our procedure. A theorem is proved, showing that delete-one cross-validation achieves the lowest possible prediction loss asymptotically. This optimality result requires a condition that unravels an important feature of high-dimensional regression. The prediction error of any individual model in the class for averaging is required to be higher than the classic root n rate under the traditional parametric regression. This condition reflects the difficulty of high-dimensional regression and it depicts a situation especially meaningful for p > n. We also conduct a simulation study to illustrate the merits of the proposed approach over several existing methods, including lasso, group lasso, forward regression, Phase Coupled (PC)-simple algorithm, Akaike information criterion (AIC) model-averaging, Bayesian information criterion (BIC) model-averaging methods, and SCAD (smoothly clipped absolute deviation). This approach uses quadratic programming to overcome the computing time issue commonly encountered in the cross-validation literature. Supplementary materials for this article are available online.
statistics & probability
What problem does this paper attempt to address?
This paper attempts to address the issue of prediction accuracy in high - dimensional regression problems, especially when the number of predictor variables \(p\) exceeds the sample size \(n\). Specifically, the author proposes a new model - averaging method to improve the prediction accuracy of the true conditional mean \(y\) given \(p\) predictor variables. This method is different from most variable - selection studies, which mainly focus on identifying the true predictor variables, but focuses on how to improve prediction accuracy through model - averaging. ### Main contributions: 1. **Algorithm Feasibility**: The proposed algorithm is computationally feasible even in the presence of thousands of covariates. 2. **Relaxing Weight Constraints**: For the first time, the standard restriction that the sum of model weights must equal 1 is removed, demonstrating the importance of this relaxation for improving prediction performance. 3. **Theoretical Analysis**: Theoretical results are provided, proving that minimizing the cross - validation criterion can asymptotically minimize the squared error between the true mean and the predicted value, with an "oracle" property similar to that of Li (1986, 1987) and Shao (1997) in the context of model selection. 4. **Unique Challenges in High - Dimensional Regression**: Theoretical results reveal an important distinction, namely that the prediction error of any individual model used for averaging must be higher than the classical root - \(n\) rate in traditional parametric regression, which reflects the difficulties encountered in high - dimensional regression, especially in the case of \(p>n\). 5. **Setting the Number of Models**: A practical method is proposed to address the problem of how to set the number of models to be averaged, and simulation studies show that this method has higher prediction accuracy than many existing methods such as LASSO, group LASSO, partial fidelity method, AIC model - averaging, BIC model - averaging method and SCAD. ### Method Steps: 1. **Prepare Candidate Models**: First, construct a set of regression models, each containing a smaller number of regressors to avoid the degeneracy of the information matrix. 2. **Optimize Model Weights**: Use the delete - one cross - validation method to determine the model weights to minimize the prediction error. Unlike traditional model - averaging methods, this paper allows the sum of weights to not equal 1, thereby improving prediction performance. ### Theoretical Results: - **Asymptotic Optimality**: Under certain assumptions, it is proved that the delete - one cross - validation method can asymptotically reach the lowest possible prediction loss. - **Condition Analysis**: The significance of these assumptions is discussed in detail, especially in the context of high - dimensional regression, and a reasonable upper limit for the number of models \(M\) is proposed. ### Simulation Studies: - Through simulation studies, the performance of the proposed method is compared with that of other existing methods, and the results show that this method has a significant advantage in prediction accuracy. In conclusion, this paper effectively solves the prediction problem in high - dimensional regression by proposing a new model - averaging method and has achieved remarkable results both theoretically and practically.