Abstract:Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to learn on new examples, a phenomenon called loss of plasticity. We provide direct demonstrations of loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89% accuracy on an early task down to 77%, about the level of a linear network, on the 2000th task. Loss of plasticity occurred with a wide range of deep network architectures, optimizers, activation functions, batch normalization, dropout, but was substantially eased by L2-regularization, particularly when combined with weight perturbation. Further, we introduce a new algorithm -- continual backpropagation -- which slightly modifies conventional backpropagation to reinitialize a small fraction of less-used units after each example and appears to maintain plasticity indefinitely.

What problem does this paper attempt to address?

The paper attempts to address the issue of whether and how deep learning systems lose plasticity in a continual learning environment. Specifically, the authors focus on whether deep learning systems gradually lose the ability to learn new tasks when faced with continuous new data, which is referred to as "loss of plasticity." This phenomenon is different from "catastrophic forgetting," which refers to the system forgetting old tasks when learning new ones, whereas loss of plasticity means the system gradually loses the ability to learn from new data. ### Background of the Paper Modern deep learning systems are usually designed to be trained once and then not trained again, which is different from the needs in a continual learning environment. In a continual learning environment, the system needs to continuously learn from new data. However, existing research shows that deep learning systems may encounter two major problems in such an environment: 1. **Catastrophic Forgetting**: The system forgets previously learned tasks when learning new ones. 2. **Loss of Plasticity**: The system gradually loses the ability to learn from new data. ### Research Objectives The main objective of the paper is to experimentally verify whether modern deep learning systems indeed lose plasticity in a continual learning environment and to explore the reasons for this phenomenon. The authors used multiple datasets and tasks to test this hypothesis, including: - **ImageNet**: A large-scale image classification dataset used to generate thousands of binary classification tasks. - **MNIST**: A handwritten digit recognition dataset used to generate multiple tasks through pixel permutations. - **Slowly-Changing Regression**: An idealized regression problem used to further verify the loss of plasticity. ### Main Findings 1. **ImageNet Experiments**: - When handling thousands of binary classification tasks, the performance of deep learning systems gradually declined, eventually approaching the level of linear models. - Different network architectures, optimizers, activation functions, and other hyperparameter settings all exhibited similar plasticity loss phenomena. 2. **Permuted MNIST Experiments**: - By generating multiple tasks through random pixel permutations, the experimental results showed that as the number of tasks increased, the system's performance gradually declined. - The impact of different network sizes and task switching frequencies on plasticity loss was tested, and the results consistently showed a loss of plasticity. 3. **Slowly-Changing Regression Experiments**: - Through a simpler regression problem, the phenomenon of plasticity loss under different activation functions was verified. ### Explanation and Solutions The authors further explored the reasons for the loss of plasticity, suggesting that certain characteristics of the initial weight distribution (such as unit diversity, non-saturated units, small weight values, etc.) gradually diminish during the learning process, leading to a decline in plasticity. To address this issue, the authors proposed a new algorithm—**Continual Backpropagation**. This algorithm maintains the system's plasticity by reinitializing a small portion of low-utility hidden units after each training sample. ### Conclusion The paper experimentally verified that modern deep learning systems indeed lose plasticity in a continual learning environment and proposed a new algorithm to mitigate this problem. This research provides an important theoretical foundation and technical support for the further development of the continual learning field.

Maintaining Plasticity in Deep Continual Learning

Loss of plasticity in deep continual learning

Progressive Learning without Forgetting

Maintaining Plasticity in Continual Learning via Regenerative Regularization

Loss of Plasticity in Continual Deep Reinforcement Learning

Maintaining Plasticity via Regenerative Regularization

A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning

Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning

Reinforced Continual Learning

Bio-inspired, task-free continual learning through activity regularization

Neuromimetic metaplasticity for adaptive continual learning

Disentangling the Causes of Plasticity Loss in Neural Networks

Understanding plasticity in neural networks

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

Learning Continually by Spectral Regularization

Plastic Learning with Deep Fourier Features

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Self-Normalized Resets for Plasticity in Continual Learning

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning