Abstract:In the linear regression model, the minimum -norm interpolant estimator has received much attention since it was proved to be consistent even though it fits noisy data perfectly under some condition on the covariance matrix of the input vector, known as benign overfitting . Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021). Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [5]: can be written as a sum of a ridge estimator and an overfitting component which follows a decomposition of the features space into the space spanned by the top k eigenvectors of and spanned by the last ones. We also prove a matching lower bound for the expected prediction risk thus obtain the sufficient and necessary conditions for benign overfitting of . The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint, coincides with the effective rank from (Bartlett Proc Natl Acad Sci 117(48), 30063–30070, 2020), (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021) and is the key tool for handling the behavior of the design matrix restricted to the sub-space where overfitting happens. We extend these results to heavy-tailed scenarii proving the universality of this phenomenon beyond exponential moment assumptions. This phenomenon is unknown before and is widely believed to be a significant challenge. This follows from an anistropic version of the probabilistic Dvoretsky-Milman theorem that holds for heavy-tailed vectors which is of independent interest.

Strong inductive biases provably prevent harmless interpolation

Harmless interpolation of noisy data in regression

Malign Overfitting: Interpolation Can Provably Preclude Invariance

Harmless interpolation in regression and classification with structured features

Kernel interpolation generalizes poorly

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

A geometrical viewpoint on the benign overfitting property of the minimum -norm interpolant estimator and its universality

Generalization error of min-norm interpolators in transfer learning

Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension

Towards an Understanding of Benign Overfitting in Neural Networks

Minimum-Norm Interpolation Under Covariate Shift

Minimum $\Ell_{1}$-Norm Interpolators: Precise Asymptotics and Multiple Descent

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

Benign overfitting in linear regression

The Implicit Bias of Benign Overfitting

Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator and its universality

Benefit of Interpolation in Nearest Neighbor Algorithms

Generalization in Kernel Regression Under Realistic Assumptions

On Optimal Interpolation In Linear Regression

Analysis of Interpolating Regression Models and the Double Descent Phenomenon