Continual Learning With Knowledge Distillation: A Survey

Songze Li,Tonghua Su,Xu-Yao Zhang,Zhongjie Wang
DOI: https://doi.org/10.1109/TNNLS.2024.3476068
2024-10-18
Abstract:The foremost challenge in continual learning is to mitigate catastrophic forgetting, allowing a model to retain knowledge of previous tasks while learning new tasks. Knowledge distillation (KD), a form of regularization, has gained significant attention for its ability to maintain a model's performance on previous tasks by mimicking the outputs of earlier models during the learning of new tasks, thus reducing forgetting. This article offers a comprehensive survey of continual learning methods employing KD within the realm of image classification. We provide a detailed analysis of how KD is utilized in continual learning methods, categorizing its application into three distinct paradigms. Besides, we classify these methods based on the type of knowledge source used and thoroughly examine how KD consolidates memory in continual learning from the perspective of loss functions. In addition, we have conducted extensive experiments on CIFAR-100, TinyImageNet, and ImageNet-100 across ten KD-integrated continual learning methods to analyze the role of KD in continual learning, and we have further discussed its effectiveness in other continual learning tasks. Our extensive experimental evidence demonstrates that KD plays a crucial role in mitigating forgetting in continual learning and substantiates that, when used with data replay, classification bias adversely affects the effectiveness of KD, whereas employing a separated softmax loss can significantly enhance its efficacy.
What problem does this paper attempt to address?