Shibhansh Dohare,J. Fernando Hernandez-Garcia,Parash Rahman,A. Rupam Mahmood,Richard S. Sutton
Abstract:Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to learn on new examples, a phenomenon called loss of plasticity. We provide direct demonstrations of loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89% accuracy on an early task down to 77%, about the level of a linear network, on the 2000th task. Loss of plasticity occurred with a wide range of deep network architectures, optimizers, activation functions, batch normalization, dropout, but was substantially eased by L2-regularization, particularly when combined with weight perturbation. Further, we introduce a new algorithm -- continual backpropagation -- which slightly modifies conventional backpropagation to reinitialize a small fraction of less-used units after each example and appears to maintain plasticity indefinitely.
What problem does this paper attempt to address?
The paper attempts to address the issue of whether and how deep learning systems lose plasticity in a continual learning environment. Specifically, the authors focus on whether deep learning systems gradually lose the ability to learn new tasks when faced with continuous new data, which is referred to as "loss of plasticity." This phenomenon is different from "catastrophic forgetting," which refers to the system forgetting old tasks when learning new ones, whereas loss of plasticity means the system gradually loses the ability to learn from new data.
### Background of the Paper
Modern deep learning systems are usually designed to be trained once and then not trained again, which is different from the needs in a continual learning environment. In a continual learning environment, the system needs to continuously learn from new data. However, existing research shows that deep learning systems may encounter two major problems in such an environment:
1. **Catastrophic Forgetting**: The system forgets previously learned tasks when learning new ones.
2. **Loss of Plasticity**: The system gradually loses the ability to learn from new data.
### Research Objectives
The main objective of the paper is to experimentally verify whether modern deep learning systems indeed lose plasticity in a continual learning environment and to explore the reasons for this phenomenon. The authors used multiple datasets and tasks to test this hypothesis, including:
- **ImageNet**: A large-scale image classification dataset used to generate thousands of binary classification tasks.
- **MNIST**: A handwritten digit recognition dataset used to generate multiple tasks through pixel permutations.
- **Slowly-Changing Regression**: An idealized regression problem used to further verify the loss of plasticity.
### Main Findings
1. **ImageNet Experiments**:
- When handling thousands of binary classification tasks, the performance of deep learning systems gradually declined, eventually approaching the level of linear models.
- Different network architectures, optimizers, activation functions, and other hyperparameter settings all exhibited similar plasticity loss phenomena.
2. **Permuted MNIST Experiments**:
- By generating multiple tasks through random pixel permutations, the experimental results showed that as the number of tasks increased, the system's performance gradually declined.
- The impact of different network sizes and task switching frequencies on plasticity loss was tested, and the results consistently showed a loss of plasticity.
3. **Slowly-Changing Regression Experiments**:
- Through a simpler regression problem, the phenomenon of plasticity loss under different activation functions was verified.
### Explanation and Solutions
The authors further explored the reasons for the loss of plasticity, suggesting that certain characteristics of the initial weight distribution (such as unit diversity, non-saturated units, small weight values, etc.) gradually diminish during the learning process, leading to a decline in plasticity. To address this issue, the authors proposed a new algorithm—**Continual Backpropagation**. This algorithm maintains the system's plasticity by reinitializing a small portion of low-utility hidden units after each training sample.
### Conclusion
The paper experimentally verified that modern deep learning systems indeed lose plasticity in a continual learning environment and proposed a new algorithm to mitigate this problem. This research provides an important theoretical foundation and technical support for the further development of the continual learning field.