Abstract:We characterize the squared prediction risk of ensemble estimators obtained through subagging (subsample bootstrap aggregating) regularized M-estimators and construct a consistent estimator for the risk. Specifically, we consider a heterogeneous collection of $M \ge 1$ regularized M-estimators, each trained with (possibly different) subsample sizes, convex differentiable losses, and convex regularizers. We operate under the proportional asymptotics regime, where the sample size $n$, feature size $p$, and subsample sizes $k_m$ for $m \in [M]$ all diverge with fixed limiting ratios $n/p$ and $k_m/n$. Key to our analysis is a new result on the joint asymptotic behavior of correlations between the estimator and residual errors on overlapping subsamples, governed through a (provably) contractible nonlinear system of equations. Of independent interest, we also establish convergence of trace functionals related to degrees of freedom in the non-ensemble setting (with $M = 1$) along the way, extending previously known cases for square loss and ridge, lasso regularizers. When specialized to homogeneous ensembles trained with a common loss, regularizer, and subsample size, the risk characterization sheds some light on the implicit regularization effect due to the ensemble and subsample sizes $(M,k)$. For any ensemble size $M$, optimally tuning subsample size yields sample-wise monotonic risk. For the full-ensemble estimator (when $M \to \infty$), the optimal subsample size $k^\star$ tends to be in the overparameterized regime $(k^\star \le \min\{n,p\})$, when explicit regularization is vanishing. Finally, joint optimization of subsample size, ensemble size, and regularization can significantly outperform regularizer optimization alone on the full data (without any subagging).

Estimating Subagging by cross-validation

Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

Adversarial Prediction Games for Multivariate Losses

Cross-validation: what does it estimate and how well does it do it?

Cross-validation on extreme regions

Assessing prediction error of nonparametric regression and classification under Bregman divergence

Corrected generalized cross-validation for finite ensembles of penalized estimators

Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

Distributional bias compromises leave-one-out cross-validation

Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions

Estimating means of bounded random variables by betting

Distribution-free Deviation Bounds and The Role of Domain Knowledge in Learning via Model Selection with Cross-validation Risk Estimation

Precise Asymptotics of Bagging Regularized M-estimators

On the Asymptotic Optimality of Cross-Validation based Hyper-parameter Estimators for Regularized Least Squares Regression Problems

Aggregated Hold-Out

Bootstrapping the Cross-Validation Estimate

Scalable Subsampling Inference for Deep Neural Networks

Estimation of prediction error with known covariate shift

When does Subagging Work?