Abstract:Recurrent neural networks (RNNs) in the brain and in silico excel at solving tasks with intricate temporal dependencies. Long timescales required for solving such tasks can arise from properties of individual neurons (single-neuron timescale, $\tau$, e.g., membrane time constant in biological neurons) or recurrent interactions among them (network-mediated timescale). However, the contribution of each mechanism for optimally solving memory-dependent tasks remains poorly understood. Here, we train RNNs to solve $N$-parity and $N$-delayed match-to-sample tasks with increasing memory requirements controlled by $N$ by simultaneously optimizing recurrent weights and $\tau$s. We find that for both tasks RNNs develop longer timescales with increasing $N$, but depending on the learning objective, they use different mechanisms. Two distinct curricula define learning objectives: sequential learning of a single-$N$ (single-head) or simultaneous learning of multiple $N$s (multi-head). Single-head networks increase their $\tau$ with $N$ and are able to solve tasks for large $N$, but they suffer from catastrophic forgetting. However, multi-head networks, which are explicitly required to hold multiple concurrent memories, keep $\tau$ constant and develop longer timescales through recurrent connectivity. Moreover, we show that the multi-head curriculum increases training speed and network stability to ablations and perturbations, and allows RNNs to generalize better to tasks beyond their training regime. This curriculum also significantly improves training GRUs and LSTMs for large-$N$ tasks. Our results suggest that adapting timescales to task requirements via recurrent interactions allows learning more complex objectives and improves the RNN's performance.

Warming up recurrent neural networks to maximise reachable multistability greatly improves learning

Stabilizing RNN Gradients through Pre-training

A bio-inspired bistable recurrent cell allows for long-lasting memory

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks

How Initial Connectivity Shapes Biologically Plausible Learning in Recurrent Neural Networks

Improving Learning Efficiency of Recurrent Neural Network Through Adjusting Weights of All Layers in a Biologically-Inspired Framework.

Improving performance of recurrent neural network with relu nonlinearity

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Learning Longer Memory in Recurrent Neural Networks

Improved Delay-Dependent Stability Condition of Discrete Recurrent Neural Networks with Time-Varying Delays

Multistability of Recurrent Neural Networks with Nonmonotonic Activation Functions and Mixed Time Delays

Global exponential stability for recurrent neural networks with a general class of activation functions and variable delays

Multilevel Initialization for Layer-Parallel Deep Neural Network Training

Exploring weight initialization, diversity of solutions, and degradation in recurrent neural networks trained for temporal and decision-making tasks

Multistability of Recurrent Neural Networks with Time-Varying Delays and Nonincreasing Activation Function

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Improved Conditions for Global Exponential Stability of Recurrent Neural Networks with Time-Varying Delays

An Improvement on Recurrent Neural Network by Combining Convolution Neural Network and a Simple Initialization of the Weights

Multistability of Recurrent Neural Networks with Time-Varying Delays and the Piecewise Linear Activation Function.

Multistability Of Competitive Neural Networks With Different Time Scales