Abstract:In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations respectively for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve minimal validation loss at the enlarged gap.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in Cross - domain Few - shot Classification (CFC), when existing methods use the same representation transformation to process prototypes and image instances during the adaptation phase, the gap between the two is reduced, thus limiting the model's ability to explore the optimal representation. Specifically: 1. **Existing problems**: - In CFC tasks, existing methods usually assume that the embeddings of prototypes and image instances share the same representation transformation. - This assumption ignores the fact that prototypes and image instances describe different levels of information: prototypes contain abstract information of intra - class image instances, while image instances contain specific instance - level information. - Therefore, using the same representation transformation may constrain the model from learning the optimal representation and reduce the natural gap between prototypes and image instances. 2. **Research findings**: - Researchers have experimentally proven that there is indeed a gap similar to the modality gap between prototypes and image instances among the embeddings extracted by the pre - trained frozen backbone network. - However, when the same representation transformation is applied, this gap will be reduced, causing the model to be unable to learn compact and distinct representation clusters. 3. **Proposed method**: - To solve the above problems, researchers proposed the Contrastive Prototype - Image Adaptation (CoPA) method. - CoPA applies different representation transformations to prototypes and image instances respectively, similar to the way the CLIP model uses text prompts as prototypes. - This can preserve the discriminative information in the gradients and explore the optimal representation that maintains the gap between prototypes and image instances. 4. **Experimental results**: - Extensive experiments show that CoPA not only achieves state - of - the - art performance on the Meta - Dataset benchmark, but also can learn compact image representation clusters more effectively and reach the global minimum of the validation loss at the enlarged gap. In summary, this paper aims to solve the problem of the reduced gap between prototypes and image instances caused by shared representation transformation in cross - domain few - shot classification tasks, and improves this by proposing the CoPA method.

Mind the Gap Between Prototypes and Images in Cross-domain Finetuning

Adaptive Parametric Prototype Learning for Cross-Domain Few-Shot Classification

PrototypeFormer: Learning to Explore Prototype Relationships for Few-shot Image Classification

Contrastive prototype network with prototype augmentation for few-shot classification

Few-Shot Cross-Domain Object Detection With Instance-Level Prototype-Based Meta-Learning

Meta-Collaborative Comparison for Effective Cross-Domain Few-Shot Learning

BMPCN: A Bigraph Mutual Prototype Calibration Net for Few-Shot Classification

Cross-Domain Few-Shot Classification Via Adversarial Task Augmentation

Revisiting Prototypical Network for Cross Domain Few-Shot Learning

Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

Bidirectional Matching Prototypical Network for Few-Shot Image Classification

CLIP Adaptation by Intra-modal Overlap Reduction

InCo: Intermediate Prototype Contrast for Unsupervised Domain Adaptation

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation

Prototype Rectification with Region-Wise Foreground Enhancement for Few-Shot Classification.

Cross-Domain Detection Via Graph-Induced Prototype Alignment

Cross-Domain Few-Shot Learning via Adaptive Transformer Networks

ICA-Proto: Iterative Cross Alignment Prototypical Network for Incremental Few-Shot Relation Classification

Target Oriented Dynamic Adaption for Cross-Domain Few-Shot Learning