Abstract:In the linear regression model, the minimum -norm interpolant estimator has received much attention since it was proved to be consistent even though it fits noisy data perfectly under some condition on the covariance matrix of the input vector, known as benign overfitting . Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021). Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [5]: can be written as a sum of a ridge estimator and an overfitting component which follows a decomposition of the features space into the space spanned by the top k eigenvectors of and spanned by the last ones. We also prove a matching lower bound for the expected prediction risk thus obtain the sufficient and necessary conditions for benign overfitting of . The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint, coincides with the effective rank from (Bartlett Proc Natl Acad Sci 117(48), 30063–30070, 2020), (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021) and is the key tool for handling the behavior of the design matrix restricted to the sub-space where overfitting happens. We extend these results to heavy-tailed scenarii proving the universality of this phenomenon beyond exponential moment assumptions. This phenomenon is unknown before and is widely believed to be a significant challenge. This follows from an anistropic version of the probabilistic Dvoretsky-Milman theorem that holds for heavy-tailed vectors which is of independent interest.

Minimum $\Ell_{1}$-Norm Interpolators: Precise Asymptotics and Multiple Descent

Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

On the robustness of minimum norm interpolators and regularized empirical risk minimizers

Minimum norm interpolation by perceptra: Explicit regularization and implicit bias

Analysis of Interpolating Regression Models and the Double Descent Phenomenon

A geometrical viewpoint on the benign overfitting property of the minimum $l_2$-norm interpolant estimator and its universality

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers

A geometrical viewpoint on the benign overfitting property of the minimum -norm interpolant estimator and its universality

Minimum-Norm Interpolation Under Covariate Shift

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

On Optimal Interpolation In Linear Regression

Generalization error of min-norm interpolators in transfer learning

The distribution of Ridgeless least squares interpolators

Surprises in high-dimensional ridgeless least squares interpolation

Harmless interpolation of noisy data in regression

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression

Linear Convergence of Inexact Descent Method and Inexact Proximal Gradient Algorithms for Lower-Order Regularization Problems

Strong inductive biases provably prevent harmless interpolation