Abstract:Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that deep neural networks (DNNs) fit the target function from low to high frequency during the training, which provides insight into the training and generalization behavior of DNNs in complex tasks. In this paper, through analysis of an infinite-width two-layer NN in the neural tangent kernel (NTK) regime, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in the frequency domain during the training. Our exact computation applies for general activation functions with no assumption on size and distribution of training data. This LFP model unravels that higher frequencies evolve polynomially or exponentially slower than lower frequencies depending on the smoothness/regularity of the activation function. We further bridge the gap between training dynamics and generalization by proving that LFP model implicitly minimizes a Frequency-Principle norm (FP-norm) of the learned function, by which higher frequencies are more severely penalized depending on the inverse of their evolution rate. Finally, we derive an \textit{a priori} generalization error bound controlled by the FP-norm of the target function, which provides a theoretical justification for the empirical results that DNNs often generalize well for low frequency functions.

A Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural Networks

On the exact computation of linear frequency principle dynamics and its generalization

Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks.

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks

Theory of the Frequency Principle for General Deep Neural Networks

Training Behavior of Deep Neural Network in Frequency Domain.

An Upper Limit of Decaying Rate with Respect to Frequency in Deep Neural Network

Frequency Principle in deep learning: an overview

Overview frequency principle/spectral bias in deep learning

Over-parametrized neural networks as under-determined linear systems

Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks

Frequency Principle in Deep Learning with General Loss Functions and Its Potential Application.

Functional Network: A Novel Framework for Interpretability of Deep Neural Networks

Towards an Understanding of Benign Overfitting in Neural Networks

A Novel Explanation Against Linear Neural Networks

Mathematical Models of Overparameterized Neural Networks

Frequency Principle in Deep Learning Beyond Gradient-descent-based Training

Benign overfitting in linear regression

Benign Overfitting in Deep Neural Networks under Lazy Training

A spectral method for a Fokker-Planck equation in neuroscience with applications in neural networks with learning rules