CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning

Yukun Li,Guansong Pang,Wei Suo,Chenchen Jing,Yuling Xi,Lingqiao Liu,Hao Chen,Guoqiang Liang,Peng Wang
2024-03-15
Abstract:This paper explores the problem of continual learning (CL) of vision-language models (VLMs) in open domains, where the models need to perform continual updating and inference on a streaming of datasets from diverse seen and unseen domains with novel classes. Such a capability is crucial for various applications in open environments, e.g., AI assistants, autonomous driving systems, and robotics. Current CL studies mostly focus on closed-set scenarios in a single domain with known classes. Large pre-trained VLMs like CLIP have demonstrated superior zero-shot recognition ability, and a number of recent studies leverage this ability to mitigate catastrophic forgetting in CL, but they focus on closed-set CL in a single domain dataset. Open-domain CL of large VLMs is significantly more challenging due to 1) large class correlations and domain gaps across the datasets and 2) the forgetting of zero-shot knowledge in the pre-trained VLMs in addition to the knowledge learned from the newly adapted datasets. In this work we introduce a novel approach, termed CoLeCLIP, that learns an open-domain CL model based on CLIP. It addresses these challenges by a joint learning of a set of task prompts and a cross-domain class vocabulary. Extensive experiments on 11 domain datasets show that CoLeCLIP outperforms state-of-the-art methods for open-domain CL under both task- and class-incremental learning settings.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper discusses the problem of open-domain continual learning (ODCL), specifically focusing on the challenges faced by Vision-Language Models (VLMs). These models require continuous updates and inference in different domains with new categories of data. Unlike traditional continual learning (CL) that mainly focuses on known categories within a single domain, ODCL needs to address the large-scale relevance and domain gaps between different tasks, as well as the potential zero-shot knowledge forgetting when adapting to new data for large-scale pre-trained VLMs. To address this problem, the paper introduces a new approach called CoLeCLIP, which is based on the CLIP model and tackles the challenges through joint task conditioning and cross-domain vocabulary learning. Specifically, CoLeCLIP captures domain-specific patterns by learning task cues and avoids forgetting through parameter-efficient fine-tuning (PEFT) module and cross-domain class vocabulary learning, including zero-shot recognition capability of pre-trained models and knowledge adaptation to new tasks. Experiments show that CoLeCLIP outperforms existing methods in both task incremental learning (TIL) and class incremental learning (CIL) settings on 11 domain datasets, demonstrating its superior performance in open-domain continual learning. Furthermore, compared to existing continual learning methods, CoLeCLIP is more lightweight and does not require large-scale external datasets for knowledge distillation, thereby reducing resource and computational time requirements. In summary, the main contributions of the paper include: 1. Introducing the problem of open-domain continual learning, highlighting the recognition ability for known and novel categories in known and unknown domains while preserving the zero-shot knowledge from pre-training and new knowledge learned from downstream tasks. 2. Proposing the lightweight yet effective CoLeCLIP approach, which addresses the unique challenges of open-domain CL through joint learning of task cues and class embeddings. 3. Conducting extensive experiments on 11 domain datasets, demonstrating that CoLeCLIP outperforms state-of-the-art methods in both task and class incremental learning settings.