Continual Learning of Image Classes with Language Guidance from a Vision-Language Model

Wentao Zhang,Yujun Huang,Weizhuo Zhang,Tong Zhang,Qicheng Lao,Yue Yu,Wei-Shi Zheng,Ruixuan Wang
DOI: https://doi.org/10.1109/tcsvt.2024.3449109
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Current deep learning models often catastrophically forget the knowledge of old classes when continually learning new ones. State-of-the-art approaches to continual learning of image classes often require retaining a small subset of old data to partly alleviate the catastrophic forgetting issue, and their performance would be degraded sharply when no old data can be stored due to privacy or safety concerns. In this study, inspired by human learning of visual knowledge with the effective help of language, we propose a novel continual learning framework based on a pre-trained vision-language model (VLM) without retaining any old data. Rich prior knowledge of each new image class is effectively encoded by the frozen text encoder of the VLM, which is then used to guide the learning of new image classes. The output space of the frozen text encoder is unchanged over the whole process of continual learning, through which image representations of different classes become comparable during model inference even when the image classes are learned at different times. Extensive empirical evaluations on multiple image classification datasets under various settings confirm the superior performance of our method over existing ones. The source code will be released publicly. The source code is available at https://github.com/Fatflower/CIL LG VLM/.
What problem does this paper attempt to address?