On the difficulty of learning chaotic dynamics with RNNs

Jonas M. Mikhaeil,Zahra Monfared,Daniel Durstewitz
DOI: https://doi.org/10.48550/arXiv.2110.07238
2022-10-07
Abstract:Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data. They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training. This is known as the exploding and vanishing gradient problem. Previous solutions to this issue either built on rather complicated, purpose-engineered architectures with gated memory buffers, or - more recently - imposed constraints that ensure convergence to a fixed point or restrict (the eigenspectrum of) the recurrence matrix. Such constraints, however, convey severe limitations on the expressivity of the RNN. Essential intrinsic dynamics such as multistability or chaos are disabled. This is inherently at disaccord with the chaotic nature of many, if not most, time series encountered in nature and society. It is particularly problematic in scientific applications where one aims to reconstruct the underlying dynamical system. Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits. We mathematically prove that RNNs producing stable equilibrium or cyclic behavior have bounded gradients, whereas the gradients of RNNs with chaotic dynamics always diverge. Based on these analyses and insights we suggest ways of how to optimize the training process on chaotic data according to the system's Lyapunov spectrum, regardless of the employed RNN architecture.
Machine Learning,Dynamical Systems
What problem does this paper attempt to address?
The paper attempts to address the problem of gradient explosion and vanishing in Recurrent Neural Networks (RNNs) during training, especially when dealing with chaotic time series data. Specifically, the paper focuses on: 1. **Gradient Explosion and Vanishing Problem**: When RNNs handle long-term dependencies, very slow or highly variable time-scale data, their loss gradients can easily saturate or diverge during backpropagation, known as the gradient explosion and vanishing problem (EVGP). 2. **Challenges of Chaotic Dynamics**: Many natural and social time series data exhibit chaotic characteristics. Existing methods to solve EVGP (such as LSTM, GRU, etc.) avoid gradient problems by limiting the dynamic behavior of RNNs, but these methods severely restrict the expressive power of RNNs, particularly in handling multistable or multiperiodic behaviors. 3. **Theoretical Analysis and Solutions**: The paper theoretically analyzes the gradient characteristics of RNNs in generating stable equilibrium points, periodic behaviors, and chaotic behaviors by linking the loss gradient of RNNs with the Lyapunov spectrum. The authors demonstrate that RNNs generating stable behaviors have bounded gradients, while RNNs generating chaotic behaviors always have divergent gradients. Based on these analyses, the paper proposes an optimized training method based on the system's Lyapunov spectrum, specifically sparsely forced BPTT, to avoid gradient explosion when dealing with chaotic data. In summary, the paper aims to provide an effective method for training RNNs when handling chaotic time series data through theoretical analysis and experimental validation, thereby overcoming existing gradient problems.