Abstract:Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.

Neural ODEs as the deep limit of ResNets with constant weights

Stochastic Neural Networks with Infinite Width are Deterministic

Deep Limits of Residual Neural Networks

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

Implicit regularization of deep residual networks towards neural ODEs

Variational formulations of ODE-Net as a mean-field optimal control problem and existence results

Neural Ordinary Differential Equations with Envolutionary Weights

Infinite‐width limit of deep linear neural networks

The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

Generalization bounds for neural ordinary differential equations and deep residual networks

Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations

Exact Solutions of a Deep Linear Network

Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function

Scaling ResNets in the Large-depth Regime

Neural signature kernels as infinite-width-depth-limits of controlled ResNets

Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences

Deep linear networks for regression are implicitly regularized towards flat minima

Deep neural network expressivity for optimal stopping problems

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions