Harmless interpolation of noisy data in regression

Vidya Muthukumar,Kailas Vodrahalli,Vignesh Subramanian,Anant Sahai
DOI: https://doi.org/10.48550/arXiv.1903.09139
2019-09-09
Abstract:A continuing mystery in understanding the empirical success of deep neural networks is their ability to achieve zero training error and generalize well, even when the training data is noisy and there are more parameters than data points. We investigate this overparameterized regime in linear regression, where all solutions that minimize training error interpolate the data, including noise. We characterize the fundamental generalization (mean-squared) error of any interpolating solution in the presence of noise, and show that this error decays to zero with the number of features. Thus, overparameterization can be explicitly beneficial in ensuring harmless interpolation of noise. We discuss two root causes for poor generalization that are complementary in nature -- signal "bleeding" into a large number of alias features, and overfitting of noise by parsimonious feature selectors. For the sparse linear model with noise, we provide a hybrid interpolating scheme that mitigates both these issues and achieves order-optimal MSE over all possible interpolating solutions.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in regression tasks, when the training data contains noise, how an over - parameterized model can achieve harmless interpolation (i.e., fit the noise without harming the generalization ability). The traditional view is that if the number of model parameters exceeds the number of data points, the model will over - fit the noise, resulting in a decline in generalization performance. However, the success of deep neural networks in recent years has challenged this view. Even when the training data contains noise and the number of parameters far exceeds the number of data points, these models can still generalize well. Specifically, the paper focuses on the over - parameterization problem in linear regression and studies the basic generalization error of all solutions that minimize the training error (i.e., interpolation solutions) in the presence of noise. Through theoretical analysis, the authors prove that when the number of features is large enough, the generalization error of the interpolation solution can converge to zero. This shows that over - parameterization will not harm the generalization ability of the model, but can, under certain conditions, be beneficial to the model's harmless interpolation of noise. The main contributions of the paper include: 1. It gives the limit of the basic generalization error of any interpolation solution in the presence of noise and proves that as the number of features increases, this error will converge to zero. 2. It provides an explanation based on Fourier theory to explain the behavior of the minimum \( \ell_2 \)-norm interpolation solution. 3. It points out that some interpolation solutions (such as the minimum \( \ell_1 \)-norm interpolation solution and its related solutions) will over - fit pure noise. 4. It constructs a two - step hybrid interpolation scheme that can fit the noise harmlessly while recovering the signal and achieve the optimal test mean square error (MSE) among all possible interpolation solutions. Through these analyses, the paper reveals the potential advantages of over - parameterized models in dealing with noisy data and provides a method for achieving harmless interpolation in high - dimensional linear regression.