A note on regularised NTK dynamics with an application to PAC-Bayesian training

Eugenio Clerico,Benjamin Guedj
2023-12-21
Abstract:We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, although the regularisation yields an additional term appears in the differential equation describing the dynamics. This setting provides an appropriate framework to study the evolution of wide networks trained to optimise generalisation objectives such as PAC-Bayes bounds, and hence potentially contribute to a deeper theoretical understanding of such networks.
Machine Learning
What problem does this paper attempt to address?
The paper investigates the effect of introducing regularization terms in neural network training on the dynamic behavior of neural networks, especially in the infinite width limit. The focus of the study is on how to track the dynamic behavior of neural networks by adding L2 regularization while keeping the network parameters close to their initialization values in a "lazy training" state. This regularization helps the network maintain linearity during the training process, which can be represented by the Neural Tangent Kernel (NTK), a fixed and deterministic neural tangent kernel. The paper first reviews the NTK dynamic behavior of neural networks in the infinite width limit, particularly without regularization. Then, the authors discuss how regularization, specifically L2 regularization, affects these dynamics. They demonstrate that regularization not only does not hinder the linearization of network dynamics but also forces the parameters to stay close to their initialization state, consistent with "lazy training." The authors further extend the analysis to consider more general regularization terms and apply them to the problem of least squares regression. In addition, the paper discusses the application of regularized dynamics in PAC-Bayes training, which involves training neural networks by optimizing the PAC-Bayes bound. Through this approach, regularization provides a theoretical understanding of the generalization performance under overparameterized settings. In summary, this paper attempts to address the question of how to understand and analyze the training dynamics of infinitely wide neural networks under regularization, particularly L2 regularization, and explore how this dynamic behavior impacts the network's generalization ability.