Abstract:Learning from a sequence of tasks for a lifetime is essential for an agent toward artificial general intelligence. Despite the explosion of this research field in recent years, most work focuses on the well-known catastrophic forgetting issue. In contrast, this work aims to explore knowledge-transferable lifelong learning without storing historical data and significant additional computational overhead. We demonstrate that existing data-free frameworks, including regularization-based single-network and structure-based multinetwork frameworks, face a fundamental issue of lifelong learning, named anterograde forgetting, i.e., preserving and transferring memory may inhibit the learning of new knowledge. We attribute it to the fact that the learning network capacity decreases while memorizing historical knowledge and conceptual confusion between the irrelevant old knowledge and the current task. Inspired by the complementary learning theory in neuroscience, we endow artificial neural networks with the ability to continuously learn without forgetting while recalling historical knowledge to facilitate learning new knowledge. Specifically, this work proposes a general framework named cycle memory networks (CMNs). The CMN consists of two individual memory networks to store short-and long-term memories separately to avoid capacity shrinkage and a transfer cell between them. It enables knowledge transfer from the long-term to the short-term memory network to mitigate conceptual confusion. In addition, the memory consolidation mechanism integrates short-term knowledge into the long-term memory network for knowledge accumulation. We demonstrate that the CMN can effectively address the anterograde forgetting on several task-related, task-conflict, class-incremental, and cross-domain benchmarks. Furthermore, we provide extensive ablation studies to verify each framework component. The source codes are available at: https://github.com/GeoX-Lab/CMN.

Continual Learning in the Teacher-Student Setup: Impact of Task Similarity

Progressive Learning without Forgetting

Disentangling and Mitigating the Impact of Task Similarity for Continual Learning

Statistical Mechanical Analysis of Catastrophic Forgetting in Continual Learning with Teacher and Student Networks

The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

Understanding Forgetting in Continual Learning with Linear Regression

Lifelong Learning With Cycle Memory Networks

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

The Ideal Continual Learner: An Agent That Never Forgets

Order parameters and phase transitions of continual learning in deep neural networks

Learning After Learning: Positive Backward Transfer in Continual Learning

Disentangled Representations for Continual Learning: Overcoming Forgetting and Facilitating Knowledge Transfer

Task Agnostic Continual Learning via Meta Learning

Class-Incremental Learning via Knowledge Amalgamation

Optimal Protocols for Continual Learning via Statistical Physics and Control Theory

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

Routing Networks with Co-training for Continual Learning

Continual Learning by Modeling Intra-Class Variation

Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer

Similarity-Based Adaptation for Task-Aware and Task-Free Continual Learning