Abstract:In the linear regression model, the minimum -norm interpolant estimator has received much attention since it was proved to be consistent even though it fits noisy data perfectly under some condition on the covariance matrix of the input vector, known as benign overfitting . Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021). Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [5]: can be written as a sum of a ridge estimator and an overfitting component which follows a decomposition of the features space into the space spanned by the top k eigenvectors of and spanned by the last ones. We also prove a matching lower bound for the expected prediction risk thus obtain the sufficient and necessary conditions for benign overfitting of . The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint, coincides with the effective rank from (Bartlett Proc Natl Acad Sci 117(48), 30063–30070, 2020), (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021) and is the key tool for handling the behavior of the design matrix restricted to the sub-space where overfitting happens. We extend these results to heavy-tailed scenarii proving the universality of this phenomenon beyond exponential moment assumptions. This phenomenon is unknown before and is widely believed to be a significant challenge. This follows from an anistropic version of the probabilistic Dvoretsky-Milman theorem that holds for heavy-tailed vectors which is of independent interest.

A geometrical viewpoint on the benign overfitting property of the minimum -norm interpolant estimator and its universality

A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator and its universality

Benign overfitting in linear regression

Malign Overfitting: Interpolation Can Provably Preclude Invariance

Towards an Understanding of Benign Overfitting in Neural Networks

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers

Minimum-Norm Interpolation Under Covariate Shift

The Implicit Bias of Benign Overfitting

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Strong inductive biases provably prevent harmless interpolation

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

Benign overfitting and adaptive nonparametric regression

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Generalization error of min-norm interpolators in transfer learning

On the robustness of minimum norm interpolators and regularized empirical risk minimizers

The distribution of Ridgeless least squares interpolators

Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension

On the Universality of the Double Descent Peak in Ridgeless Regression