Abstract:The Ridgeless minimum $\ell_2$-norm interpolator in overparametrized linear regression has attracted considerable attention in recent years. While it seems to defy the conventional wisdom that overfitting leads to poor prediction, recent research reveals that its norm minimizing property induces an `implicit regularization' that helps prediction in spite of interpolation. This renders the Ridgeless interpolator a theoretically tractable proxy that offers useful insights into the mechanisms of modern machine learning methods. This paper takes a different perspective that aims at understanding the precise stochastic behavior of the Ridgeless interpolator as a statistical estimator. Specifically, we characterize the distribution of the Ridgeless interpolator in high dimensions, in terms of a Ridge estimator in an associated Gaussian sequence model with positive regularization, which plays the role of the prescribed implicit regularization in the context of prediction risk. Our distributional characterizations hold for general random designs and extend uniformly to positively regularized Ridge estimators. As a demonstration of the analytic power of these characterizations, we derive approximate formulae for a general class of weighted $\ell_q$ risks for Ridge(less) estimators that were previously available only for $\ell_2$. Our theory also provides certain further conceptual reconciliation with the conventional wisdom: given any data covariance, a certain amount of regularization in Ridge regression remains beneficial for `most' signals across various statistical tasks including prediction, estimation and inference, as long as the noise level is non-trivial. Surprisingly, optimal tuning can be achieved simultaneously for all the designated statistical tasks by a single generalized or $k$-fold cross-validation scheme, despite being designed specifically for tuning prediction risk.

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

Understanding Implicit Regularization in Over-Parameterized Single Index Model

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

The distribution of Ridgeless least squares interpolators

Dimension free ridge regression

Implicit Regularization Paths of Weighted Neural Representations

Linear Convergence of Inexact Descent Method and Inexact Proximal Gradient Algorithms for Lower-Order Regularization Problems

Characterizing the SLOPE Trade-off: A Variational Perspective and the Donoho-Tanner Limit

Robust Implicit Regularization via Weight Normalization

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Nonasymptotic theory for two-layer neural networks: Beyond the bias-variance trade-off

Harmless Overparametrization in Two-layer Neural Networks

A Statistical Theory of Regularization-Based Continual Learning

Optimal Rates for Coefficient-Based Regularized Regression

Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics

High-Dimensional Linear Regression via Implicit Regularization

On Choosing Initial Values of Iteratively Reweighted $\ell_1$ Algorithms for the Piece-wise Exponential Penalty

Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression

Kernel ridge vs. principal component regression: minimax bounds and adaptability of regularization operators

Optimal Ridge Regularization for Out-of-Distribution Prediction

A Risk Ratio Comparison of $l_0$ and $l_1$ Penalized Regression