Mind the Gap Between Prototypes and Images in Cross-domain Finetuning

Hongduan Tian,Feng Liu,Zhanke Zhou,Tongliang Liu,Chengqi Zhang,Bo Han
2024-10-20
Abstract:In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations respectively for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve minimal validation loss at the enlarged gap.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in Cross - domain Few - shot Classification (CFC), when existing methods use the same representation transformation to process prototypes and image instances during the adaptation phase, the gap between the two is reduced, thus limiting the model's ability to explore the optimal representation. Specifically: 1. **Existing problems**: - In CFC tasks, existing methods usually assume that the embeddings of prototypes and image instances share the same representation transformation. - This assumption ignores the fact that prototypes and image instances describe different levels of information: prototypes contain abstract information of intra - class image instances, while image instances contain specific instance - level information. - Therefore, using the same representation transformation may constrain the model from learning the optimal representation and reduce the natural gap between prototypes and image instances. 2. **Research findings**: - Researchers have experimentally proven that there is indeed a gap similar to the modality gap between prototypes and image instances among the embeddings extracted by the pre - trained frozen backbone network. - However, when the same representation transformation is applied, this gap will be reduced, causing the model to be unable to learn compact and distinct representation clusters. 3. **Proposed method**: - To solve the above problems, researchers proposed the Contrastive Prototype - Image Adaptation (CoPA) method. - CoPA applies different representation transformations to prototypes and image instances respectively, similar to the way the CLIP model uses text prompts as prototypes. - This can preserve the discriminative information in the gradients and explore the optimal representation that maintains the gap between prototypes and image instances. 4. **Experimental results**: - Extensive experiments show that CoPA not only achieves state - of - the - art performance on the Meta - Dataset benchmark, but also can learn compact image representation clusters more effectively and reach the global minimum of the validation loss at the enlarged gap. In summary, this paper aims to solve the problem of the reduced gap between prototypes and image instances caused by shared representation transformation in cross - domain few - shot classification tasks, and improves this by proposing the CoPA method.