Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting

Linhai Zhuo,Zheng Wang,Yuqian Fu,Tianwen Qian
2024-12-01
Abstract:The source-free cross-domain few-shot learning (CD-FSL) task aims to transfer pretrained models to target domains utilizing minimal samples, eliminating the need for source domain data. Addressing this issue requires models to have robust generalization abilities and strong feature representation, aligning with the characteristics of large-scale pretrained models. However, large-scale models tend to lose representational ability in cross-domain scenarios due to limited sample diversity. \zlh{Given the abundant diversity provided by semantic modality, this paper leverages textual modality to enhance training sample diversity with CLP model}, meanwhile improving model transfer efficiency. Specifically, we propose the SeGD-VPT framework, which is divided into two phases. The first step aims to increase feature diversity by adding diversity prompts to each support sample, thereby generating varying input and enhancing sample diversity. Furthermore, we use diversity descriptions of classes to guide semantically meaningful learning of diversity prompts, proposing random combinations and selections of texts to increase textual diversity. Additionally, deep prompt tuning is introduced to enhance the model's transfer capability. After training of the first step, support samples with different diversity prompts are input into the CLIP backbone to generate enhanced features. After generation, the second phase trains classifiers using the generated features. Extensive experimental results across several benchmarks verify our method is comparable to SOTA source-utilized models and attain the best performance under the source-free CD-FSL setting.
Computer Vision and Pattern Recognition,Computation and Language,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two key challenges in **Source - Free Cross - Domain Few - Shot Learning (CD - FSL)**: 1. **Mismatch of data distribution**: The data distribution in the target domain is significantly different from that in the source domain of the pre - trained model, resulting in poor performance of the model in the target domain. For example, X - ray images in the ChestX dataset have a significant impact on the visual encoder, making it difficult to transfer the encoder to recognize the unique features of target - domain images. 2. **Lack of sample diversity**: Although large - scale pre - trained models have strong feature representation capabilities, in few - shot learning tasks, due to the limited amount of data and lack of diversity, it is difficult to accurately reflect the data distribution in the target domain. This not only weakens the representation ability of the model but also seriously affects its generalization performance in the target domain. To solve these problems, the authors propose the **Semantic Guided Diversity Visual Prompt Tuning (SeGD - VPT)** framework. Specifically, SeGD - VPT enhances sample diversity and improves the model's transfer ability in the following ways: - **Visual modality**: Add learnable diversity prompt tokens at the image input layer to increase the diversity of CLIP visual encoder inputs. - **Text modality**: Collect multiple descriptions of categories from the network as describe prompts, and generate rich and diverse semantic features through random combinations. Use contrastive learning to align these features with class prompts to ensure classification consistency. - **Cross - modality**: Use diverse semantic features to guide the learning of diversity prompts, making the prompts more semantically meaningful, and further enhance the alignment between visual features and text features through the Target Supervised Contrastive Loss (TSC loss). Through these methods, the SeGD - VPT framework can effectively improve the performance of the model in cross - domain few - shot learning tasks in a source - free setting and has achieved better results than existing methods on multiple benchmark datasets.