Abstract:Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code is released at <a class="link-external link-https" href="https://github.com/linghan1997/Regression-based-Analytic-Incremental-Learning" rel="external noopener nofollow">this https URL</a>.

Rethinking Domain Adaptation and Generalization in the Era of CLIP

Class-Level Adaptation Network with Self Training for Unsupervised Domain Adaptation

Domain Adaptation Meets Zero-Shot Learning: an Annotation-Efficient Approach to Multi-Modality Medical Image Segmentation

ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

Adversarial Domain Adaptation with CLIP for Few-Shot Image Classification

CLIP-guided Black-Box Domain Adaptation of Image Classification

Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization

CLIP the Divergence: Language-guided Unsupervised Domain Adaptation

Strong but simple: A Baseline for Domain Generalized Dense Perception by CLIP-based Transfer Learning

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization

Improving Zero-Shot Generalization for CLIP with Variational Adapter

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP

Dr. CLIP: CLIP-Driven Universal Framework for Zero-Shot Sketch Image Retrieval

In the Era of Prompt Learning with Vision-Language Models

Open-world Domain Adaptation and Generalization

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Is Less More? Exploring Token Condensation as Training-free Adaptation for CLIP

Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models