Abstract:It has been found recently that more data can, counter-intuitively, hurt the performance of deep neural networks. Here, we show that a more extreme version of the phenomenon occurs in data-driven models of dynamical systems. To elucidate the underlying mechanism, we focus on next-generation reservoir computing (NGRC) -- a popular framework for learning dynamics from data. We find that, despite learning a better representation of the flow map with more training data, NGRC can adopt an ill-conditioned ``integrator'' and lose stability. We link this data-induced instability to the auxiliary dimensions created by the delayed states in NGRC. Based on these findings, we propose simple strategies to mitigate the instability, either by increasing regularization strength in tandem with data size, or by carefully introducing noise during training. Our results highlight the importance of proper regularization in data-driven modeling of dynamical systems.

What problem does this paper attempt to address?

The paper discusses how an excessive amount of data can counterintuitively impair the performance of Next-Generation Reservoir Computing (NGRC) models, especially when predicting dynamic systems, which can result in long-term instability. NGRC is a popular framework for learning dynamic systems from data. The study found that although using more training data can improve the quality of the flow representation, NGRC may adopt poorly conditioned "integrators," leading to instability. This data-induced instability is related to the auxiliary dimensions created by the delayed states in NGRC. The paper presents a case study using a magnetic pendulum system to demonstrate that as the number of training trajectories increases, the NGRC model can more accurately capture complex attractor basins. However, after reaching a certain threshold, even if the model is stable with fewer data, it becomes unstable and all predicted trajectories diverge to infinity. This instability is related to the strength of regularization, but more data requires more aggressive regularization to delay the occurrence of instability. The paper rules out overfitting of the flow surfaces as a possible cause of instability and explains the instability from a numerical analysis perspective, treating the NGRC model as an integrator. As the training data volume increases, the integrator learned by the NGRC model becomes increasingly unstable, reflected in the increase of the condition number κ of the readout matrix. The paper also proposes two mitigation strategies: first, increasing the regularization strength synchronously with the increase in data volume; second, carefully introducing noise during the training process. Furthermore, the paper provides a complementary explanation from a geometric perspective, indicating that NGRC actually learns the flow map in a higher-dimensional space. When it attempts to fit more data on the lower-dimensional submanifold of the real system, lateral instability occurs in other dimensions, leading to divergence behavior when the model is not perfectly fitted to the flow map or when the starting point is not on the submanifold. In summary, the paper reveals that appropriate regularization is crucial for avoiding data-induced instability in dynamic system modeling.

How more data can hurt: Instability and regularization in next-generation reservoir computing

Catch-22s of reservoir computing

Stabilizing machine learning prediction of dynamics: Novel noise-inspired regularization tested with reservoir computing

Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical Systems

Reservoir Computing with Noise

Reservoir Computing with Error Correction: Long-term Behaviors of Stochastic Dynamical Systems

How Does Data Diversity Shape the Weight Landscape of Neural Networks?

On the instability and degeneracy of deep learning models

On instabilities in neural network-based physics simulators

Learning noise-induced transitions by multi-scaling reservoir computing

Measuring and Mitigating Local Instability in Deep Neural Networks

Controlling dynamical systems to complex target states using machine learning: next-generation vs. classical reservoir computing

Some Results on Neural Network Stability, Consistency, and Convergence: Insights into Non-IID Data, High-Dimensional Settings, and Physics-Informed Neural Networks

Noise Resistance of Next Generation Reservoir Computing: A Comparative Study with High-Order Correlation Computation

Optical next generation reservoir computing

Mapping topological characteristics of dynamical systems into neural networks: A reservoir computing approach

On Stability and Regularization for Data-Driven Solution of Parabolic Inverse Source Problems

A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks

Hierarchical architectures in reservoir computing systems

Learning unseen coexisting attractors

Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing