Infinite‐width limit of deep linear neural networks

Lénaïc Chizat,Maria Colombo,Xavier Fernández‐Real,Alessio Figalli
DOI: https://doi.org/10.1002/cpa.22200
2024-05-07
Communications on Pure and Applied Mathematics
Abstract:This paper studies the infinite‐width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous‐time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal l2 ‐norm minimizer of the risk.
mathematics, applied
What problem does this paper attempt to address?