Abstract:Continual learning aims to enable a model to incrementally learn knowledge from sequentially arrived data. Previous works adopt the conventional classification architecture, which consists of a feature extractor and a classifier. The feature extractor is shared across sequentially arrived tasks or classes, but one specific group of weights of the classifier corresponding to one new class should be incrementally expanded. Consequently, the parameters of a continual learner gradually increase. Moreover, as the classifier contains all historical arrived classes, a certain size of the memory is usually required to store rehearsal data to mitigate classifier bias and catastrophic forgetting. In this paper, we propose a non-incremental learner, named AttriCLIP, to incrementally extract knowledge of new classes or tasks. Specifically, AttriCLIP is built upon the pre-trained visual-language model CLIP. Its image encoder and text encoder are fixed to extract features from both images and text. Text consists of a category name and a fixed number of learnable parameters which are selected from our designed attribute word bank and serve as attributes. As we compute the visual and textual similarity for classification, AttriCLIP is a non-incremental learner. The attribute prompts, which encode the common knowledge useful for classification, can effectively mitigate the catastrophic forgetting and avoid constructing a replay memory. We evaluate our AttriCLIP and compare it with CLIP-based and previous state-of-the-art continual learning methods in realistic settings with domain-shift and long-sequence learning. The results show that our method performs favorably against previous state-of-the-arts. The implementation code can be available at <a class="link-external link-https" href="https://github.com/bhrqw/AttriCLIP" rel="external noopener nofollow">this https URL</a>.

LVP-CLIP:Revisiting CLIP for Continual Learning with Label Vector Pool

Don't Stop Learning: Towards Continual Learning for the CLIP Model

AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning

Continual Learning of Image Classes with Language Guidance from a Vision-Language Model

CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning

Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models

Class Incremental Learning with Pre-trained Vision-Language Models

Enhancing Visual Continual Learning with Language-Guided Supervision

Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models

Continual Vision-Language Representation Learning with Off-Diagonal Information

Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

Prior-Free Continual Learning with Unlabeled Data in the Wild

Generative Negative Text Replay for Continual Vision-Language Pretraining

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

How Much Can CLIP Benefit Vision-and-Language Tasks?

ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding