Quadratic models for understanding catapult dynamics of neural networks

Libin Zhu,Chaoyue Liu,Adityanarayanan Radhakrishnan,Mikhail Belkin
2024-05-02
Abstract:While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the "catapult phase" phenomenon that occurs during the training of neural networks, especially when a large learning rate is used. Specifically, the authors explore why linear models are unable to capture certain characteristics of finite - width neural networks in terms of optimization and generalization, and these characteristics are particularly evident when a large learning rate is used. To better understand and analyze this phenomenon, the paper introduces a new model - Neural Quadratic Models (NQMs), and through theoretical analysis and experimental verification, shows that NQMs can exhibit behavior similar to that of finite - width neural networks, especially in the "catapult phase". ### Main problems 1. **Limitations of linear models**: Although neural networks can be approximated as linear models as the network width increases, linear models cannot explain certain important optimization and generalization behaviors, especially when a large learning rate is used. 2. **Understanding of the "catapult phase"**: How to explain and model the "catapult phase" that occurs during the training of neural networks, that is, the phenomenon where the loss function first increases rapidly and then decreases. 3. **Effectiveness of the quadratic model**: Verify whether the Neural Quadratic Model can more accurately capture and explain the behavior of neural networks in the "catapult phase". ### Solutions 1. **Introducing Neural Quadratic Models (NQMs)**: Define NQMs through second - order Taylor expansion to more accurately approximate the behavior of neural networks. 2. **Theoretical analysis**: Derive the dynamic equations of NQMs during the gradient descent process and analyze the optimization behaviors under different learning rates. 3. **Experimental verification**: Verify the performance of NQMs on different datasets and network architectures through experiments, especially in comparison with linear models and the original neural networks. ### Main contributions 1. **Proving the "catapult phase" of NQMs**: Through theoretical analysis and experimental verification, prove that NQMs will exhibit the "catapult phase" under a large learning rate. 2. **Optimization dynamic analysis**: Analyze in detail the optimization dynamics of NQMs during the gradient descent process, and identify three different learning rate intervals and their corresponding optimization behaviors. 3. **Generalization performance improvement**: The experimental results show that NQMs have better test performance when the "catapult phase" occurs, especially in shallow and deep networks. ### Conclusion This paper provides a new perspective to understand the behavior of neural networks in the "catapult phase" by introducing NQMs. NQMs can not only better capture the nonlinear characteristics of neural networks in terms of optimization and generalization, but also provide a theoretical basis and experimental evidence for future research.