Abstract:In the field of machine learning, continual learning is a crucial concept that allows models to adapt to non-stationary data distributions. However, most of the existing works focus on uni-modal settings and ignore the multi-modal data. In this paper, to enable neural networks better understand diverse modalities in real-world scenario, we investigate continual learning for two typical vision-language applications, i.e. retrieval and grounding. Instead of conventional exemplar-based methods, we leverage the pre-trained transformer model (e.g. CLIP/GLIP) and the prompt technique to tackle this problem. Under this scheme, we identify two critical limitations in existing methods: (1) Unfamiliarity across tasks, which prevents task-specific prompts from achieving forward propagation; and (2) Heterogeneity between modalities, which makes it difficult to guarantee a consistent optimization direction for prompts of different modalities. To overcome these constraints, we design Historical Prompt Calibration that includes two objectives to calibrate prompts. First, the intra-modal relevance estimation helps encode sufficient task-specific information for prompts, with the help a relevance estimator developed for recognizing task relevance. Second, the inter-modal consistency alignment enhances the agreement of the two modality-specific prompts in the current task by contrasting them with the prompts from previous tasks. We evaluate the superiority of our strategy over state-of-the arts methods by four vision-language applications, including two retrieval tasks (i.e. image- and video-text retrieval) and two grounding tasks (i.e. referring expression comprehension and segmentation).

Prompt Gradient Projection for Continual Learning.

UniGrad-FS: Unified Gradient Projection with Flatter Sharpness for Continual Learning

Visual Prompt Tuning in Null Space for Continual Learning

Prompt-aligned Gradient for Prompt Tuning

Gradient Projection For Continual Parameter-Efficient Tuning

Vector Quantization Prompting for Continual Learning

Evolving Parameterized Prompt Memory for Continual Learning

PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer

Pro-tuning: Unified Prompt Tuning for Vision Tasks

Mixture of Experts Meets Prompt-Based Continual Learning

When Prompt-based Incremental Learning Does Not Meet Strong Pretraining

Class Gradient Projection for Continual Learning

Prompting to Prompt for Rehearsal-Free Class Incremental Learning.

Hierarchical Prompts for Rehearsal-free Continual Learning

Gradient Projection Memory for Continual Learning

Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding

Progressive Visual Prompt Learning with Contrastive Feature Re-formation

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

INCPrompt: Task-Aware incremental Prompting for Rehearsal-Free Class-incremental Learning