Abstract:A continuing mystery in understanding the empirical success of deep neural networks is their ability to achieve zero training error and generalize well, even when the training data is noisy and there are more parameters than data points. We investigate this overparameterized regime in linear regression, where all solutions that minimize training error interpolate the data, including noise. We characterize the fundamental generalization (mean-squared) error of any interpolating solution in the presence of noise, and show that this error decays to zero with the number of features. Thus, overparameterization can be explicitly beneficial in ensuring harmless interpolation of noise. We discuss two root causes for poor generalization that are complementary in nature -- signal "bleeding" into a large number of alias features, and overfitting of noise by parsimonious feature selectors. For the sparse linear model with noise, we provide a hybrid interpolating scheme that mitigates both these issues and achieves order-optimal MSE over all possible interpolating solutions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in regression tasks, when the training data contains noise, how an over - parameterized model can achieve harmless interpolation (i.e., fit the noise without harming the generalization ability). The traditional view is that if the number of model parameters exceeds the number of data points, the model will over - fit the noise, resulting in a decline in generalization performance. However, the success of deep neural networks in recent years has challenged this view. Even when the training data contains noise and the number of parameters far exceeds the number of data points, these models can still generalize well. Specifically, the paper focuses on the over - parameterization problem in linear regression and studies the basic generalization error of all solutions that minimize the training error (i.e., interpolation solutions) in the presence of noise. Through theoretical analysis, the authors prove that when the number of features is large enough, the generalization error of the interpolation solution can converge to zero. This shows that over - parameterization will not harm the generalization ability of the model, but can, under certain conditions, be beneficial to the model's harmless interpolation of noise. The main contributions of the paper include: 1. It gives the limit of the basic generalization error of any interpolation solution in the presence of noise and proves that as the number of features increases, this error will converge to zero. 2. It provides an explanation based on Fourier theory to explain the behavior of the minimum \( \ell_2 \)-norm interpolation solution. 3. It points out that some interpolation solutions (such as the minimum \( \ell_1 \)-norm interpolation solution and its related solutions) will over - fit pure noise. 4. It constructs a two - step hybrid interpolation scheme that can fit the noise harmlessly while recovering the signal and achieve the optimal test mean square error (MSE) among all possible interpolation solutions. Through these analyses, the paper reveals the potential advantages of over - parameterized models in dealing with noisy data and provides a method for achieving harmless interpolation in high - dimensional linear regression.

Harmless interpolation of noisy data in regression

Harmless interpolation in regression and classification with structured features

Strong inductive biases provably prevent harmless interpolation

Generalization error of min-norm interpolators in transfer learning

Minimum $\Ell_{1}$-Norm Interpolators: Precise Asymptotics and Multiple Descent

Towards an Understanding of Benign Overfitting in Neural Networks

Benefit of Interpolation in Nearest Neighbor Algorithms

Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

Benign overfitting in linear regression

Kernel interpolation generalizes poorly

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

Minimum-Norm Interpolation Under Covariate Shift

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Malign Overfitting: Interpolation Can Provably Preclude Invariance

Analysis of Interpolating Regression Models and the Double Descent Phenomenon

Surprises in high-dimensional ridgeless least squares interpolation

Benign Overfitting in Deep Neural Networks under Lazy Training

Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension

On Optimal Interpolation In Linear Regression

DeepNNK: Explaining deep models and their generalization using polytope interpolation