On the difficulty of learning chaotic dynamics with RNNs

Jonas M. Mikhaeil,Zahra Monfared,Daniel Durstewitz

DOI: https://doi.org/10.48550/arXiv.2110.07238

2022-10-07

Abstract:Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data. They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training. This is known as the exploding and vanishing gradient problem. Previous solutions to this issue either built on rather complicated, purpose-engineered architectures with gated memory buffers, or - more recently - imposed constraints that ensure convergence to a fixed point or restrict (the eigenspectrum of) the recurrence matrix. Such constraints, however, convey severe limitations on the expressivity of the RNN. Essential intrinsic dynamics such as multistability or chaos are disabled. This is inherently at disaccord with the chaotic nature of many, if not most, time series encountered in nature and society. It is particularly problematic in scientific applications where one aims to reconstruct the underlying dynamical system. Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits. We mathematically prove that RNNs producing stable equilibrium or cyclic behavior have bounded gradients, whereas the gradients of RNNs with chaotic dynamics always diverge. Based on these analyses and insights we suggest ways of how to optimize the training process on chaotic data according to the system's Lyapunov spectrum, regardless of the employed RNN architecture.

Machine Learning,Dynamical Systems

What problem does this paper attempt to address?

The paper attempts to address the problem of gradient explosion and vanishing in Recurrent Neural Networks (RNNs) during training, especially when dealing with chaotic time series data. Specifically, the paper focuses on: 1. **Gradient Explosion and Vanishing Problem**: When RNNs handle long-term dependencies, very slow or highly variable time-scale data, their loss gradients can easily saturate or diverge during backpropagation, known as the gradient explosion and vanishing problem (EVGP). 2. **Challenges of Chaotic Dynamics**: Many natural and social time series data exhibit chaotic characteristics. Existing methods to solve EVGP (such as LSTM, GRU, etc.) avoid gradient problems by limiting the dynamic behavior of RNNs, but these methods severely restrict the expressive power of RNNs, particularly in handling multistable or multiperiodic behaviors. 3. **Theoretical Analysis and Solutions**: The paper theoretically analyzes the gradient characteristics of RNNs in generating stable equilibrium points, periodic behaviors, and chaotic behaviors by linking the loss gradient of RNNs with the Lyapunov spectrum. The authors demonstrate that RNNs generating stable behaviors have bounded gradients, while RNNs generating chaotic behaviors always have divergent gradients. Based on these analyses, the paper proposes an optimized training method based on the system's Lyapunov spectrum, specifically sparsely forced BPTT, to avoid gradient explosion when dealing with chaotic data. In summary, the paper aims to provide an effective method for training RNNs when handling chaotic time series data through theoretical analysis and experimental validation, thereby overcoming existing gradient problems.

On the difficulty of learning chaotic dynamics with RNNs

Generalized Teacher Forcing for Learning Chaotic Dynamics

Bifurcations and loss jumps in RNN training

Constraining Chaos: Enforcing dynamical invariants in the training of recurrent neural networks

Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies

Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences

Linear RNNs Provably Learn Linear Dynamical Systems

Recurrent neural networks: vanishing and exploding gradients are not the end of the story

How neural networks learn to classify chaotic time series

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Dynamical stability and chaos in artificial neural network trajectories along training

Critical dynamics study on recurrent neural networks: Globally exponential stability

Training neural operators to preserve invariant measures of chaotic attractors

Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Input Driven Synchronization of Chaotic Neural Networks with Analyticaly Determined Conditional Lyapunov Exponents

Statistical physics of learning in high-dimensional chaotic systems

Stimulus-Driven and Spontaneous Dynamics in Excitatory-Inhibitory Recurrent Neural Networks for Sequence Representation

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Critical Dynamical Analysis for Α-Uam RNNs Without Diagonal Nonlinear Requirements.

Recurrent Neural Networks in the Eye of Differential Equations