Abstract:Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime. The criteria is the relative change of input weights (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) as the width approaches infinity during the training, which tends to $0$, $+\infty$ and $O(1)$, respectively. In addition, we also demonstrate that different layers can lie in different dynamical regimes in a training process within a deep NN. In the condensed regime, we also observe the condensation of weights in isolated orientations with low complexity. Through experiments under three-layer condition, our phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studying deep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.

Phase Diagram for Two-layer ReLU Neural Networks at Infinite-width Limit.

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

Phase Diagram of Initial Condensation for Two-layer Neural Networks

Generalization Ability of Wide Neural Networks on $\mathbb{R}$

Towards Understanding the Condensation of Neural Networks at Initial Training

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

On the Principles of ReLU Networks with One Hidden Layer

How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer

Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks

Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training.

A theoretical framework for deep locally connected ReLU network

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width

Weight decay induced phase transitions in multilayer neural networks

The Expressive Power of Neural Networks: A View from the Width

How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

The Geometric Structure of Fully-Connected ReLU Layers

On Multi-Stage Loss Dynamics in Neural Networks: Mechanisms of Plateau and Descent Stages

The Evolution of the Interplay Between Input Distributions and Linear Regions in Networks

A priori generalization error for two-layer ReLU neural network through minimum norm solution