A note on regularised NTK dynamics with an application to PAC-Bayesian training

Eugenio Clerico,Benjamin Guedj

2023-12-21

Abstract:We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, although the regularisation yields an additional term appears in the differential equation describing the dynamics. This setting provides an appropriate framework to study the evolution of wide networks trained to optimise generalisation objectives such as PAC-Bayes bounds, and hence potentially contribute to a deeper theoretical understanding of such networks.

Machine Learning

What problem does this paper attempt to address?

The paper investigates the effect of introducing regularization terms in neural network training on the dynamic behavior of neural networks, especially in the infinite width limit. The focus of the study is on how to track the dynamic behavior of neural networks by adding L2 regularization while keeping the network parameters close to their initialization values in a "lazy training" state. This regularization helps the network maintain linearity during the training process, which can be represented by the Neural Tangent Kernel (NTK), a fixed and deterministic neural tangent kernel. The paper first reviews the NTK dynamic behavior of neural networks in the infinite width limit, particularly without regularization. Then, the authors discuss how regularization, specifically L2 regularization, affects these dynamics. They demonstrate that regularization not only does not hinder the linearization of network dynamics but also forces the parameters to stay close to their initialization state, consistent with "lazy training." The authors further extend the analysis to consider more general regularization terms and apply them to the problem of least squares regression. In addition, the paper discusses the application of regularized dynamics in PAC-Bayes training, which involves training neural networks by optimizing the PAC-Bayes bound. Through this approach, regularization provides a theoretical understanding of the generalization performance under overparameterized settings. In summary, this paper attempts to address the question of how to understand and analyze the training dynamics of infinitely wide neural networks under regularization, particularly L2 regularization, and explore how this dynamic behavior impacts the network's generalization ability.

A note on regularised NTK dynamics with an application to PAC-Bayesian training

The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

When and why PINNs fail to train: A neural tangent kernel perspective

Neural Tangent Kernel of Matrix Product States: Convergence and Applications

PAC-Bayes Generalisation Bounds for Dynamical Systems Including Stable RNNs

Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

Dynamically Stable Infinite-Width Limits of Neural Classifiers

Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

Dynamics of finite width Kernel and prediction fluctuations in mean field neural networks *

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

A Theory of Neural Tangent Kernel Alignment and Its Influence on Training

Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Disentangling feature and lazy training in deep neural networks