Abstract:In the linear regression model, the minimum -norm interpolant estimator has received much attention since it was proved to be consistent even though it fits noisy data perfectly under some condition on the covariance matrix of the input vector, known as benign overfitting . Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021). Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [5]: can be written as a sum of a ridge estimator and an overfitting component which follows a decomposition of the features space into the space spanned by the top k eigenvectors of and spanned by the last ones. We also prove a matching lower bound for the expected prediction risk thus obtain the sufficient and necessary conditions for benign overfitting of . The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint, coincides with the effective rank from (Bartlett Proc Natl Acad Sci 117(48), 30063–30070, 2020), (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021) and is the key tool for handling the behavior of the design matrix restricted to the sub-space where overfitting happens. We extend these results to heavy-tailed scenarii proving the universality of this phenomenon beyond exponential moment assumptions. This phenomenon is unknown before and is widely believed to be a significant challenge. This follows from an anistropic version of the probabilistic Dvoretsky-Milman theorem that holds for heavy-tailed vectors which is of independent interest.

Distribution Free Uncertainty for the Minimum Norm Solution of Over-parameterized Linear Regression

Quantifying the Prediction Uncertainty of Machine Learning Models for Individual Data

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

Taking a Moment for Distributional Robustness

On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models

A geometrical viewpoint on the benign overfitting property of the minimum -norm interpolant estimator and its universality

Minimum-Norm Interpolation Under Covariate Shift

An Information-Theoretic Learning Model Based on Importance Sampling

Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression

Benign overfitting in linear regression

Asymptotic Normality and Confidence Intervals for Prediction Risk of the Min-Norm Least Squares Estimator.

A Novel Regression Loss for Non-Parametric Uncertainty Optimization

On the robustness of minimum norm interpolators and regularized empirical risk minimizers

Minimax Regret Optimization for Robust Machine Learning under Distribution Shift

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Anytime-Valid Generalized Universal Inference on Risk Minimizers

Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression

Distribution-Free Robust Linear Regression

Distribution-free risk assessment of regression-based machine learning algorithms

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks