Abstract:While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to understand the "catapult phase" phenomenon that occurs during the training of neural networks, especially when a large learning rate is used. Specifically, the authors explore why linear models are unable to capture certain characteristics of finite - width neural networks in terms of optimization and generalization, and these characteristics are particularly evident when a large learning rate is used. To better understand and analyze this phenomenon, the paper introduces a new model - Neural Quadratic Models (NQMs), and through theoretical analysis and experimental verification, shows that NQMs can exhibit behavior similar to that of finite - width neural networks, especially in the "catapult phase". ### Main problems 1. **Limitations of linear models**: Although neural networks can be approximated as linear models as the network width increases, linear models cannot explain certain important optimization and generalization behaviors, especially when a large learning rate is used. 2. **Understanding of the "catapult phase"**: How to explain and model the "catapult phase" that occurs during the training of neural networks, that is, the phenomenon where the loss function first increases rapidly and then decreases. 3. **Effectiveness of the quadratic model**: Verify whether the Neural Quadratic Model can more accurately capture and explain the behavior of neural networks in the "catapult phase". ### Solutions 1. **Introducing Neural Quadratic Models (NQMs)**: Define NQMs through second - order Taylor expansion to more accurately approximate the behavior of neural networks. 2. **Theoretical analysis**: Derive the dynamic equations of NQMs during the gradient descent process and analyze the optimization behaviors under different learning rates. 3. **Experimental verification**: Verify the performance of NQMs on different datasets and network architectures through experiments, especially in comparison with linear models and the original neural networks. ### Main contributions 1. **Proving the "catapult phase" of NQMs**: Through theoretical analysis and experimental verification, prove that NQMs will exhibit the "catapult phase" under a large learning rate. 2. **Optimization dynamic analysis**: Analyze in detail the optimization dynamics of NQMs during the gradient descent process, and identify three different learning rate intervals and their corresponding optimization behaviors. 3. **Generalization performance improvement**: The experimental results show that NQMs have better test performance when the "catapult phase" occurs, especially in shallow and deep networks. ### Conclusion This paper provides a new perspective to understand the behavior of neural networks in the "catapult phase" by introducing NQMs. NQMs can not only better capture the nonlinear characteristics of neural networks in terms of optimization and generalization, but also provide a theoretical basis and experimental evidence for future research.

Quadratic models for understanding catapult dynamics of neural networks

Computational and Storage Efficient Quadratic Neurons for Deep Neural Networks

Catapult Dynamics and Phase Transitions in Quadratic Nets

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Optimizing Quantized Neural Networks in a Weak Curvature Manifold

Predictive Modelling of Quantum Process with Neural Networks

Least Squares Training of Quadratic Convolutional Neural Networks with Applications to System Theory

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

On the learning dynamics of two-layer quadratic neural networks for understanding deep learning

Avoiding Spurious Local Minima in Deep Quadratic Networks

Nonlinear Modeling of Neural Interaction for Spike Prediction Using the Staged Point-Process Model

Robust Generalization of Quadratic Neural Networks via Function Identification

A Study on Dynamic Volatile Organic Compound Emission Characterization of Water-Based Paints

Efficient Vectorized Backpropagation Algorithms for Training Feedforward Networks Composed of Quadratic Neurons

A Quadratic Actor Network for Model-Free Reinforcement Learning

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

Dynamical transition in controllable quantum neural networks with large depth

Fuzzy Logic Interpretation of Quadratic Networks

Bayes-optimal learning of an extensive-width neural network from quadratically many samples

Stability Analysis Using Quadratic Constraints for Systems With Neural Network Controllers