Abstract:The Ridgeless minimum $\ell_2$-norm interpolator in overparametrized linear regression has attracted considerable attention in recent years. While it seems to defy the conventional wisdom that overfitting leads to poor prediction, recent research reveals that its norm minimizing property induces an `implicit regularization' that helps prediction in spite of interpolation. This renders the Ridgeless interpolator a theoretically tractable proxy that offers useful insights into the mechanisms of modern machine learning methods. This paper takes a different perspective that aims at understanding the precise stochastic behavior of the Ridgeless interpolator as a statistical estimator. Specifically, we characterize the distribution of the Ridgeless interpolator in high dimensions, in terms of a Ridge estimator in an associated Gaussian sequence model with positive regularization, which plays the role of the prescribed implicit regularization in the context of prediction risk. Our distributional characterizations hold for general random designs and extend uniformly to positively regularized Ridge estimators. As a demonstration of the analytic power of these characterizations, we derive approximate formulae for a general class of weighted $\ell_q$ risks for Ridge(less) estimators that were previously available only for $\ell_2$. Our theory also provides certain further conceptual reconciliation with the conventional wisdom: given any data covariance, a certain amount of regularization in Ridge regression remains beneficial for `most' signals across various statistical tasks including prediction, estimation and inference, as long as the noise level is non-trivial. Surprisingly, optimal tuning can be achieved simultaneously for all the designated statistical tasks by a single generalized or $k$-fold cross-validation scheme, despite being designed specifically for tuning prediction risk.

Surprises in high-dimensional ridgeless least squares interpolation

The distribution of Ridgeless least squares interpolators

On Ridge Estimation in High-dimensional Rotationally Sparse Linear Regression

Precise analysis of ridge interpolators under heavy correlations -- a Random Duality Theory view

Analysis of Interpolating Regression Models and the Double Descent Phenomenon

Dimension free ridge regression

Ridge interpolators in correlated factor regression models -- exact risk analysis

The generalization error of random features regression: Precise asymptotics and double descent curve

On Optimal Interpolation In Linear Regression

Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

On the Universality of the Double Descent Peak in Ridgeless Regression

Surprises in adversarially-trained linear regression

Harmless interpolation of noisy data in regression

Highly Adaptive Ridge

Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors

A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator and its universality

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm

On the Optimality of Misspecified Kernel Ridge Regression

On best approximation by multivariate ridge functions with applications to generalized translation networks