Abstract:Building learning agents that can progressively learn and accumulate knowledge is the core goal of the continual learning (CL) research field. Unfortunately, training a model on new data usually compromises the performance on past data. In the CL literature, this effect is referred to as catastrophic forgetting (CF). CF has been largely studied, and a plethora of methods have been proposed to address it on short sequences of non-overlapping tasks. In such setups, CF always leads to a quick and significant drop in performance in past tasks. Nevertheless, despite CF, recent work showed that SGD training on linear models accumulates knowledge in a CL regression setup. This phenomenon becomes especially visible when tasks reoccur. We might then wonder if DNNs trained with SGD or any standard gradient-based optimization accumulate knowledge in such a way. Such phenomena would have interesting consequences for applying DNNs to real continual scenarios. Indeed, standard gradient-based optimization methods are significantly less computationally expensive than existing CL algorithms. In this paper, we study the progressive knowledge accumulation (KA) in DNNs trained with gradient-based algorithms in long sequences of tasks with data re-occurrence. We propose a new framework, SCoLe (Scaling Continual Learning), to investigate KA and discover that catastrophic forgetting has a limited effect on DNNs trained with SGD. When trained on long sequences with data sparsely re-occurring, the overall accuracy improves, which might be counter-intuitive given the CF phenomenon. We empirically investigate KA in DNNs under various data occurrence frequencies and propose simple and scalable strategies to increase knowledge accumulation in DNNs.

Understanding Forgetting in Continual Learning with Linear Regression

Progressive Learning without Forgetting

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model

Overcoming Catastrophic Forgetting in Continual Learning by Exploring Eigenvalues of Hessian Matrix.

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

Challenging Common Assumptions about Catastrophic Forgetting

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

Slowing Down Forgetting in Continual Learning

Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping

Quantum Continual Learning Overcoming Catastrophic Forgetting

Examining Forgetting in Continual Pre-training of Aligned Large Language Models

Controlling Forgetting with Test-Time Data in Continual Learning

Statistical Mechanical Analysis of Catastrophic Forgetting in Continual Learning with Teacher and Student Networks

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning

Orthogonal Gradient Descent for Continual Learning

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Towards guarantees for parameter isolation in continual learning

Continual Learning in the Teacher-Student Setup: Impact of Task Similarity

Catastrophic Forgetting in Deep Learning: A Comprehensive Taxonomy

Overcoming Catastrophic Forgetting for Continual Learning Via Model Adaptation