Mutual Mentor: Online Contrastive Distillation Network for General Continual Learning

Qiang Wang,Zhong Ji,Jin Li,Yanwei Pang
DOI: https://doi.org/10.1016/j.neucom.2023.03.066
IF: 6
2023-04-02
Neurocomputing
Abstract:The goal of General Continual Learning (GCL) is to preserve learned knowledge and learn new knowledge with constant memory from infinite data stream where task boundaries are blurry. Distilling the model's response of reserved samples between the old and new models is an effective way to achieve promising performance on GCL. However, it accumulates the inherent old model's response bias and is not robust to model changes. To this end, we propose a M utual M entor G eneral C ontinual L earning ( MMGCL ) framework to tackle these problems, which explores a training process in which the student and teacher models mentor each other. Concretely, the student model consolidates the learned knowledge by respectively aligning the relation and adaptive responses with those of the teacher model while the teacher model updates its parameters by integrating the parameters of the student model to accumulate new knowledge. To further improve the effectiveness of the mutual mentor, we integrate the inter-instance knowledge to optimize the outputs of the teacher model, which can not only supervise the student model but also indirectly optimize the teacher model. Extensive experiments on six benchmark datasets demonstrate that our MMGCL significantly outperforms state-of-the-art approaches under diverse continual learning settings with various buffer sizes.
computer science, artificial intelligence
What problem does this paper attempt to address?