Slowing Down Forgetting in Continual Learning

Pascal Janetzky,Tobias Schlagenhauf,Stefan Feuerriegel
2024-11-11
Abstract:A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods to slow down forgetting. We further demonstrate the performance gain from our framework across a large series of experiments, including different CL scenarios (class incremental, domain incremental, task incremental learning) different datasets (MNIST, CIFAR10), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper attempts to solve the problem of catastrophic forgetting in Continual Learning (CL). Specifically, when the model is learning a new task, its performance on the old task will decline significantly. This phenomenon usually occurs when the model parameters are updated to adapt to new data, resulting in the knowledge of the old task being overwritten or forgotten. ### Main contributions of the paper 1. **Proposed a new CL framework (ReCL)**: - This framework takes advantage of the implicit bias in gradient - optimized neural network training, that is, these networks will converge to the maximum - margin points. - These convergence points allow the reconstruction of old data from the model weights and combine it with the data of the current task for training. 2. **Utilize the model as its own memory buffer**: - Unlike other methods, ReCL does not rely on external storage to save old data, nor does it require additional training of the generative model. - Instead, it reconstructs old data through the implicit bias of the model itself. 3. **Flexibility and wide applicability**: - ReCL can be flexibly applied to existing state - of - the - art CL methods, further slowing down forgetting and improving performance. - The authors have carried out extensive experimental verification on multiple standard CL scenarios (class - incremental learning, domain - incremental learning, task - incremental learning), different datasets (MNIST, CIFAR10) and different network architectures (multi - layer perceptron, convolutional neural network). ### Experimental results - **Performance improvement**: - ReCL shows consistent performance improvement in all experiments, especially on the ACC (average accuracy) and BWT (backward transfer) metrics. - Even without other CL methods, using only ReCL can significantly improve performance. - **Sensitivity analysis**: - Increasing the number of reconstructed samples can further improve performance, but even with only 10 reconstructed samples, ReCL is already better than the default baseline. - Different hyper - parameter tuning strategies (unsupervised, supervised) can bring further performance improvements, among which the unsupervised strategy performs best in all experiments. ### Conclusion ReCL effectively slows down the problem of catastrophic forgetting in continuous learning by using the implicit bias of the model to reconstruct old data. This framework not only performs well in multiple CL scenarios, but also has a high degree of flexibility and can be combined with other CL methods to further improve performance.