Abstract:Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.

Global Optimality of Elman-type RNN in the Mean-Field Regime

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Convergence Analysis of Regularized Elman Neural Networks under Relaxed Conditions

Generalization of Scaled Deep ResNets in the Mean-Field Regime

A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization

A Mean Field Theory of Batch Normalization

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization

A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport

Generalization Ability of Wide Residual Networks

Unified field theoretical approach to deep and recurrent neuronal networks

Convergence of Gradient Method for Elman Networks

Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences

Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets

Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks

A Mean-Field Optimal Control Formulation of Deep Learning

Mean Field Analysis of Neural Networks: A Law of Large Numbers

Generalization Ability of Wide Neural Networks on $\mathbb{R}$

Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input