Model averaging: A shrinkage perspective

Jingfu Peng
2024-04-28
Abstract:Model averaging (MA), a technique for combining estimators from a set of candidate models, has attracted increasing attention in machine learning and statistics. In the existing literature, there is an implicit understanding that MA can be viewed as a form of shrinkage estimation that draws the response vector towards the subspaces spanned by the candidate models. This paper explores this perspective by establishing connections between MA and shrinkage in a linear regression setting with multiple nested models. We first demonstrate that the optimal MA estimator is the best linear estimator with monotonically non-increasing weights in a Gaussian sequence model. The Mallows MA (MMA), which estimates weights by minimizing the Mallows' $C_p$ over the unit simplex, can be viewed as a variation of the sum of a set of positive-part Stein estimators. Indeed, the latter estimator differs from the MMA only in that its optimization of Mallows' $C_p$ is within a suitably relaxed weight set. Motivated by these connections, we develop a novel MA procedure based on a blockwise Stein estimation. The resulting Stein-type MA estimator is asymptotically optimal across a broad parameter space when the variance is known. Numerical results support our theoretical findings. The connections established in this paper may open up new avenues for investigating MA from different perspectives. A discussion on some topics for future research concludes the paper.
Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the relationship between model averaging (MA) and shrinkage estimation, especially in the multi - model setting. Specifically, the author explores whether model averaging can be regarded as a form of shrinkage estimation in the linear regression setting of multiple nested models, and by establishing the connection between model averaging and shrinkage estimation, proposes a new model - averaging method based on block Stein estimation to achieve asymptotic optimality. ### Main research questions 1. **Relationship between model averaging and shrinkage estimation**: - The paper explores whether model averaging can be regarded as a form of shrinkage estimation, especially in the multi - model setting. - The author shows that model averaging can be regarded as a form of shrinkage estimation in some cases by establishing the theoretical connection between model averaging and shrinkage estimation. 2. **Achievement of asymptotic optimality**: - The paper proposes a new model - averaging method based on block Stein estimation, aiming to achieve asymptotic optimality. - The theoretical results of the new method are verified by numerical experiments, indicating that within the appropriate parameter space, this method can achieve the minimum model - averaging risk. ### Research background - **Model Averaging (MA)**: An estimator that combines multiple candidate models to improve the accuracy of prediction and estimation. In recent years, model averaging has received extensive attention in machine learning and statistics. - **Shrinkage Estimation**: By shrinking the estimated values towards a certain center point or subspace, the variance of the estimate is reduced, thereby improving the stability of the estimate. Common shrinkage estimation methods include the James - Stein estimator, etc. ### Research contributions 1. **Theoretical connection**: - The author proves that in the Gaussian sequence model, the optimal model - averaging estimator is equivalent to the best linear estimator with monotonically non - increasing weights. - The Mallows model - averaging (MMA) estimator can be regarded as a variant of multiple positive - part Stein estimators, which are optimized in different orthogonal subspaces. 2. **New method**: - A model - averaging method based on the block Stein rule is proposed. This method can realize the full potential of model averaging in a sufficiently large parameter space in an appropriately constructed set of candidate models. - Numerical experiments support the theoretical results of this method, indicating that it also performs well in finite samples. ### Key conclusions - **Connection between model averaging and shrinkage estimation**: In the multi - model setting, model averaging can be regarded as a form of shrinkage estimation, which provides a new perspective for understanding model averaging from different angles. - **Asymptotic optimality**: The proposed model - averaging method based on the block Stein rule can achieve asymptotic optimality under appropriate conditions, that is, achieve the minimum model - averaging risk among all nested models. ### Future research directions - Further explore the relationship between model averaging and shrinkage estimation in more complex model settings. - Develop more efficient and robust model - averaging methods to adapt to different application scenarios. Through these studies, the paper not only deepens the understanding of the relationship between model averaging and shrinkage estimation but also provides new tools and theoretical foundations for model - averaging methods in practical applications.