Abstract:Recurrent neural networks (RNNs) are popular machine learning tools for modeling and forecasting sequential data and for inferring dynamical systems (DS) from observed time series. Concepts from DS theory (DST) have variously been used to further our understanding of both, how trained RNNs solve complex tasks, and the training process itself. Bifurcations are particularly important phenomena in DS, including RNNs, that refer to topological (qualitative) changes in a system's dynamical behavior as one or more of its parameters are varied. Knowing the bifurcation structure of an RNN will thus allow to deduce many of its computational and dynamical properties, like its sensitivity to parameter variations or its behavior during training. In particular, bifurcations may account for sudden loss jumps observed in RNN training that could severely impede the training process. Here we first mathematically prove for a particular class of ReLU-based RNNs that certain bifurcations are indeed associated with loss gradients tending toward infinity or zero. We then introduce a novel heuristic algorithm for detecting all fixed points and k-cycles in ReLU-based RNNs and their existence and stability regions, hence bifurcation manifolds in parameter space. In contrast to previous numerical algorithms for finding fixed points and common continuation methods, our algorithm provides exact results and returns fixed points and cycles up to high orders with surprisingly good scaling behavior. We exemplify the algorithm on the analysis of the training process of RNNs, and find that the recently introduced technique of generalized teacher forcing completely avoids certain types of bifurcations in training. Thus, besides facilitating the DST analysis of trained RNNs, our algorithm provides a powerful instrument for analyzing the training process itself.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the bifurcation phenomena that occur during the training of Recurrent Neural Networks (RNNs) and their impact on the loss function. Specifically, the paper attempts to solve the following issues: 1. **Relationship between bifurcation phenomena and loss jumps**: - The paper demonstrates that in certain types of ReLU-based RNNs, some bifurcation phenomena indeed cause the loss gradient to approach infinity or zero, leading to sudden jumps in the loss function. These bifurcation phenomena can severely hinder the training process. 2. **Efficient methods for detecting fixed points and periodic points**: - A novel heuristic algorithm (SCYFI) is proposed to accurately detect all fixed points and periodic points in ReLU-based RNNs, as well as their regions of existence and stability, thereby determining the bifurcation manifolds in the parameter space. Compared to previous methods, this algorithm not only provides accurate results but also has good scalability. 3. **Analysis of the impact of bifurcation phenomena on the training process**: - By analyzing the loss landscape during the training process, it is found that bifurcation curves closely overlap with steep loss cliffs, and bifurcation phenomena in the system dynamics are accompanied by sudden jumps in the loss function. Additionally, the paper proves and demonstrates that the recently proposed Generalized Teacher Forcing (GTF) technique can completely eliminate certain types of bifurcations during training, thereby explaining its effectiveness. In summary, this paper aims to reveal the nature of bifurcation phenomena during RNN training and their impact on the loss function through theoretical proofs, algorithm development, and experimental validation. It also proposes effective solutions to optimize the training process.

Bifurcations and loss jumps in RNN training

On the difficulty of learning chaotic dynamics with RNNs

Improving Fault Tolerance for Reliable DNN Using Boundary-Aware Activation

Bifurcation Analysis in a Recurrent Neural Network Model with Delays

Learning Fixed Points of Recurrent Neural Networks by Reparameterizing the Network Model

Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems

Equilibria and their bifurcations in a recurrent neural network involving iterates of a transcendental function.

Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies

Why ReLU Units Sometimes Die: Analysis of Single-Unit Error Backpropagation in Neural Networks

Predicting discrete-time bifurcations with deep learning

Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance

Dynamics of learning near singularities in radial basis function networks.

Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences

Gradient descent provably escapes saddle points in the training of shallow ReLU networks

Topological obstruction to the training of shallow ReLU neural networks

Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

Multistability of Fractional-Order Recurrent Neural Networks with Discontinuous and Nonmonotonic Activation Functions