Abstract:This article provides an expository account of training dynamics in the Deep Linear Network (DLN) from the perspective of the geometric theory of dynamical systems. Rigorous results by several authors are unified into a thermodynamic framework for deep learning. The analysis begins with a characterization of the invariant manifolds and Riemannian geometry in the DLN. This is followed by exact formulas for a Boltzmann entropy, as well as stochastic gradient descent of free energy using a Riemannian Langevin Equation. Several links between the DLN and other areas of mathematics are discussed, along with some open questions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the training dynamics in deep linear networks (DLN). Specifically, it studies the training process of DLN from the perspective of geometric dynamical system theory. By introducing concepts such as Riemannian geometry and invariant manifolds, the author attempts to explain some fundamental problems in deep learning, such as: 1. **Convergence**: During the training process, can the parameters converge to the optimal solution? 2. **Convergence speed**: If it can converge, then what is the speed of convergence? 3. **Training efficiency**: How to improve the training efficiency? 4. **Dynamical characteristics depending on network structure and data scale**: How does the training dynamics depend on the network architecture or the size of the data set? ### Main contributions of the paper The paper studies these problems by constructing a simplified model - deep linear network (DLN). DLN is a deep learning model for linear functions, which simplifies the training dynamics to a gradient flow in the matrix space. Although DLN seems simple, it has a rich mathematical structure and can reveal some complex phenomena in deep learning. ### Main technical means 1. **Invariant manifolds and Riemannian geometry**: - By studying the invariant manifolds (especially the equilibrium manifolds) in DLN, the author reveals the influence of over - parametrization on the training dynamics. - By introducing Riemannian geometry tools, a natural Boltzmann entropy is defined, and the microscopic fluctuations are described by the Riemannian Langevin equation. 2. **Degenerate loss function**: - It is discussed how the training dynamics of DLN will be affected when the loss function degenerates (for example, in matrix completion tasks). 3. **Thermodynamic framework**: - The work of multiple authors is unified into a thermodynamic framework to explain the entropy origin of implicit regularization. ### Formula summary - **Gradient flow equation**: \[ \dot{W}=-\nabla_W L(W) \] where \( L(W) = E\circ\phi(W) \), and \( \phi(W)=W_N W_{N - 1}\cdots W_1 \) is the end - to - end matrix. - **Gradient flow on the equilibrium manifold**: \[ \dot{W}=-\sum_{k = 1}^N(A_{p + 1}A_{p + 1}^T)E'(W)(B_{p - 1}^T B_{p - 1}) \] where \( A_p = W_N\cdots W_p \), and \( B_p = W_p\cdots W_1 \). - **Riemannian gradient flow**: \[ \dot{W}=-\text{grad}_{g_N}E(W) \] where \( g_N(W)(Z, Z)=\text{Tr}(Z^T A^{-1}_{N,W}Z) \). Through these techniques and formulas, the author not only explains the training dynamics of DLN, but also provides a new perspective for understanding more complex deep learning models.

The geometry of the deep linear network

Training Dynamics of Deep Network Linear Regions

How deep learning works --The geometry of deep learning

The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold

Markov-Lipschitz Deep Learning

Deep Manifold Part 1: Anatomy of Neural Network Manifold

Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective

Speed Limits for Deep Learning

Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse

Understanding over-parameterized deep networks by geometrization

Geometry and Dynamics of LayerNorm

On the Geometry of Deep Learning

Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems

Absence of Closed-Form Descriptions for Gradient Flow in Two-Layer Narrow Networks

Deep Learning and Geometric Deep Learning: an introduction for mathematicians and physicists

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Deep network as memory space: complexity, generalization, disentangled representation and interpretability

The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds

A Study of the Mathematics of Deep Learning

Exploring the Manifold of Neural Networks Using Diffusion Geometry