Abstract:In the linear regression model, the minimum -norm interpolant estimator has received much attention since it was proved to be consistent even though it fits noisy data perfectly under some condition on the covariance matrix of the input vector, known as benign overfitting . Motivated by this phenomenon, we study the generalization property of this estimator from a geometrical viewpoint. Our main results extend and improve the convergence rates as well as the deviation probability from (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021). Our proof differs from the classical bias/variance analysis and is based on the self-induced regularization property introduced in [5]: can be written as a sum of a ridge estimator and an overfitting component which follows a decomposition of the features space into the space spanned by the top k eigenvectors of and spanned by the last ones. We also prove a matching lower bound for the expected prediction risk thus obtain the sufficient and necessary conditions for benign overfitting of . The two geometrical properties of random Gaussian matrices at the heart of our analysis are the Dvoretsky-Milman theorem and isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension appearing naturally in our geometrical viewpoint, coincides with the effective rank from (Bartlett Proc Natl Acad Sci 117(48), 30063–30070, 2020), (Tsigler et al. in J Mach Learn Res 24(123):1–76, 2021) and is the key tool for handling the behavior of the design matrix restricted to the sub-space where overfitting happens. We extend these results to heavy-tailed scenarii proving the universality of this phenomenon beyond exponential moment assumptions. This phenomenon is unknown before and is widely believed to be a significant challenge. This follows from an anistropic version of the probabilistic Dvoretsky-Milman theorem that holds for heavy-tailed vectors which is of independent interest.

Tractability from overparametrization: the example of the negative perceptron

Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

On the Atypical Solutions of the Symmetric Binary Perceptron

Discrepancy Algorithms for the Binary Perceptron

On Accelerated Perceptrons and Beyond

Generalization ability of a perceptron with non-monotonic transfer function

The star-shaped space of solutions of the spherical negative perceptron

A geometrical viewpoint on the benign overfitting property of the minimum -norm interpolant estimator and its universality

Clustering of solutions in the symmetric binary perceptron

Phase Transitions in Transfer Learning for High-Dimensional Perceptrons

Capacity Lower Bound for the Ising Perceptron.

Harmless Overparametrization in Two-layer Neural Networks

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Fl RDT based ultimate lowering of the negative spherical perceptron capacity

Over-parametrized neural networks as under-determined linear systems

Rethinking Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics

Tractability of non-homogeneous tensor product problems in the worst case setting

Benign Overfitting in Deep Neural Networks under Lazy Training

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

Linear Separation via Optimism