Abstract:Continual learning (CL) aims at studying how to learn new knowledge continuously from data streams without catastrophically forgetting the previous knowledge. One of the key problems is catastrophic forgetting, that is, the performance of the model on previous tasks declines significantly after learning the subsequent task. Several studies addressed it by replaying samples stored in the buffer when training new tasks. However, the data imbalance between old and new task samples results in two serious problems: information suppression and weak feature discriminability. The former refers to the information in the sufficient new task samples suppressing that in the old task samples, which is harmful to maintaining the knowledge since the biased output worsens the consistency of the same sample's output at different moments. The latter refers to the feature representation being biased to the new task, which lacks discrimination to distinguish both old and new tasks. To this end, we build an imbalance mitigation for CL (IMCL) framework that incorporates a decoupled knowledge distillation (DKD) approach and a dual enhanced contrastive learning (DECL) approach to tackle both problems. Specifically, the DKD approach alleviates the suppression of the new task on the old tasks by decoupling the model output probability during the replay stage, which better maintains the knowledge of old tasks. The DECL approach enhances both low-and high-level features and fuses the enhanced features to construct contrastive loss to effectively distinguish different tasks. Extensive experiments on three popular datasets show that our method achieves promising performance under task incremental learning (Task-IL), class incremental learning (Class-IL), and domain incremental learning (Domain-IL) settings.

Densely Distilling Cumulative Knowledge for Continual Learning

Continual Learning With Knowledge Distillation: A Survey

Knowledge Condensation Distillation

Memory Efficient Data-Free Distillation for Continual Learning.

Deep Collective Knowledge Distillation

Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning

Residual Error Based Knowledge Distillation

Semi-Online Knowledge Distillation

An Embarrassingly Simple Approach for Knowledge Distillation

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Imbalance Mitigation for Continual Learning via Knowledge Decoupling and Dual Enhanced Contrastive Learning

Channel-wise Knowledge Distillation for Dense Prediction

Online Knowledge Distillation via Collaborative Learning

Collaborative Knowledge Distillation Via Multiknowledge Transfer.

Online Knowledge Distillation with Diverse Peers

Continual Distillation Learning: An Empirical Study of Knowledge Distillation in Prompt-based Continual Learning

Deeply-Supervised Knowledge Distillation

Knowledge Distillation with Deep Supervision

Continual Federated Learning Based on Knowledge Distillation

Stage-by-stage Knowledge Distillation

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning