Abstract:In recent years, machine learning models have achieved success based on the independently and identically distributed assumption. However, this assumption can be easily violated in real-world applications, leading to the Out-of-Distribution (OOD) problem. Understanding how modern over-parameterized DNNs behave under non-trivial natural distributional shifts is essential, as current theoretical understanding is insufficient. Existing theoretical works often provide meaningless results for over-parameterized models in OOD scenarios or even contradict empirical findings. To this end, we are investigating the performance of the over-parameterized model in terms of OOD generalization under the general benign overfitting conditions. Our analysis focuses on a random feature model and examines non-trivial natural distributional shifts, where the benign overfitting estimators demonstrate a constant excess OOD loss, despite achieving zero excess in-distribution (ID) loss. We demonstrate that in this scenario, further increasing the model's parameterization can significantly reduce the OOD loss. Intuitively, the variance term of ID loss remains low due to orthogonality of long-tail features, meaning overfitting noise during training generally doesn't raise testing loss. However, in OOD cases, distributional shift increases the variance term. Thankfully, the inherent shift is unrelated to individual x, maintaining the orthogonality of long-tail features. Expanding the hidden dimension can additionally improve this orthogonality by mapping the features into higher-dimensional spaces, thereby reducing the variance term. We further show that model ensembles also improve OOD loss, akin to increasing model capacity. These insights explain the empirical phenomenon of enhanced OOD generalization through model ensembles, supported by consistent simulations with theoretical results.

Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations

Convex Formulation of Overparameterized Deep Neural Networks

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks

How to Characterize The Landscape of Overparameterized Convolutional Neural Networks

Harmless Overparametrization in Two-layer Neural Networks

How Does Overparameterization Affect Features?

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

Convergence Analysis for Over-Parameterized Deep Learning

Mathematical Models of Overparameterized Neural Networks

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Over-parametrized neural networks as under-determined linear systems

Preconditioned Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression

A Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural Networks

On the Benefits of Over-parameterization for Out-of-Distribution Generalization

Training Over-parameterized Deep ResNet is Almost As Easy As Training a Two-layer Network

On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions

Stochastic Gradient Descent for Two-layer Neural Networks

Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization

Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

An effective algorithm for hyperparameter optimization of neural networks