Abstract:When multiple models are considered in regression problems, the model averaging method can be used to weigh and integrate the models. In the present study, we examined how the goodness-of-prediction of the estimator depends on the dimensionality of explanatory variables when using a generalization of the model averaging method in a linear model. We specifically considered the case of high-dimensional explanatory variables, with multiple linear models deployed for subsets of these variables. Consequently, we derived the optimal weights that yield the best predictions. we also observe that the double-descent phenomenon occurs in the model averaging estimator. Furthermore, we obtained theoretical results by adapting methods such as the random forest to linear regression models. Finally, we conducted a practical verification through numerical experiments.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
This paper aims to explore how the predictive performance of Model Averaging Estimators depends on the dimension of explanatory variables in a high - dimensional data environment. Specifically, the researchers considered the case of model averaging using subset variables in multiple linear models and derived the optimal weights to obtain the best prediction results. In addition, they also observed the "double descent" phenomenon in the model averaging estimators and verified it through numerical experiments.
### Main contributions
1. **Accurate analysis of model averaging estimators**:
- Using Random Matrix Theory (RMT), the researchers calculated and described the predictive performance of model averaging estimators in linear models under the assumption of sample isotropy.
- The research results show that the "double descent" phenomenon also occurs when using this estimator.
- The asymptotic behavior of each model when randomly selecting samples and features in a high - dimensional environment was derived.
2. **Optimal weights**:
- The model averaging estimator consists of a weight vector and multiple minimum - norm least - squares estimators. The weight vector can be optimized to achieve the best prediction of the true value.
- The researchers derived the exact theoretical curve of the prediction risk of this estimator and obtained the optimal weight vector according to the conditions assumed in the study.
### High - dimensional asymptotic framework
The researchers assumed the following high - dimensional asymptotic conditions:
1. **Data generation**: The elements of data \( X_n\in\mathbb{R}^{n\times p} \) are independently and identically distributed, satisfying \( E[X_{n,ij}] = 0 \), \( \text{Var}[X_{n,ij}] = 1 \), and \( E[|X_{n,ij}|^{12 + \omega}]<\infty \) (where \( \omega>0 \)).
2. **Sample size and dimension**: The sample size \( n\rightarrow\infty \), the dimension \( p\rightarrow\infty \), and the ratio \( p / n\rightarrow\gamma>0 \).
3. **Dimension of candidate models**: The dimension of candidate models \( |S_n^i|\rightarrow\infty \), \( |S_n^i\cap S_n^j|\rightarrow\infty \) as \( n\rightarrow\infty \); conversely, \( |S_n^i|/n\rightarrow\gamma_i>0 \), \( |S_n^i\cap S_n^j|/n\rightarrow\gamma_{ij}>0 \) for any \( i, j \).
4. **Samples used by candidate models**: The number of samples used by candidate models \( |T_n^i|\rightarrow\infty \) as \( n\rightarrow\infty \); conversely, \( |T_n^i|/n\rightarrow\eta_i>0 \), \( |T_n^i\cap T_n^j|/n\rightarrow\eta_{ij}>0 \) for any \( i \).
5. **Weight vector**: The weight vector \( w_n\in\mathbb{R}^m \) satisfies \( \sum_{i = 1}^m w_{n,i}=1 \), and converges to the weight vector \( w\in\mathbb{R}^m \) when \( n\rightarrow\infty \), \( p\rightarrow\infty \), \( p / n\rightarrow\gamma \).
### Related work
- **Model averaging estimators in linear regression**: Many studies have explored methods of estimating target variables in multiple linear models through weighted sums, and the weights are usually determined by various model selection criteria.
- **Random forests and distributed learning**: The researchers also considered the case of classifying data on sample and feature indices, and this method is similar to random forests and distributed learning