Abstract:The source-free cross-domain few-shot learning (CD-FSL) task aims to transfer pretrained models to target domains utilizing minimal samples, eliminating the need for source domain data. Addressing this issue requires models to have robust generalization abilities and strong feature representation, aligning with the characteristics of large-scale pretrained models. However, large-scale models tend to lose representational ability in cross-domain scenarios due to limited sample diversity. \zlh{Given the abundant diversity provided by semantic modality, this paper leverages textual modality to enhance training sample diversity with CLP model}, meanwhile improving model transfer efficiency. Specifically, we propose the SeGD-VPT framework, which is divided into two phases. The first step aims to increase feature diversity by adding diversity prompts to each support sample, thereby generating varying input and enhancing sample diversity. Furthermore, we use diversity descriptions of classes to guide semantically meaningful learning of diversity prompts, proposing random combinations and selections of texts to increase textual diversity. Additionally, deep prompt tuning is introduced to enhance the model's transfer capability. After training of the first step, support samples with different diversity prompts are input into the CLIP backbone to generate enhanced features. After generation, the second phase trains classifiers using the generated features. Extensive experimental results across several benchmarks verify our method is comparable to SOTA source-utilized models and attain the best performance under the source-free CD-FSL setting.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two key challenges in **Source - Free Cross - Domain Few - Shot Learning (CD - FSL)**: 1. **Mismatch of data distribution**: The data distribution in the target domain is significantly different from that in the source domain of the pre - trained model, resulting in poor performance of the model in the target domain. For example, X - ray images in the ChestX dataset have a significant impact on the visual encoder, making it difficult to transfer the encoder to recognize the unique features of target - domain images. 2. **Lack of sample diversity**: Although large - scale pre - trained models have strong feature representation capabilities, in few - shot learning tasks, due to the limited amount of data and lack of diversity, it is difficult to accurately reflect the data distribution in the target domain. This not only weakens the representation ability of the model but also seriously affects its generalization performance in the target domain. To solve these problems, the authors propose the **Semantic Guided Diversity Visual Prompt Tuning (SeGD - VPT)** framework. Specifically, SeGD - VPT enhances sample diversity and improves the model's transfer ability in the following ways: - **Visual modality**: Add learnable diversity prompt tokens at the image input layer to increase the diversity of CLIP visual encoder inputs. - **Text modality**: Collect multiple descriptions of categories from the network as describe prompts, and generate rich and diverse semantic features through random combinations. Use contrastive learning to align these features with class prompts to ensure classification consistency. - **Cross - modality**: Use diverse semantic features to guide the learning of diversity prompts, making the prompts more semantically meaningful, and further enhance the alignment between visual features and text features through the Target Supervised Contrastive Loss (TSC loss). Through these methods, the SeGD - VPT framework can effectively improve the performance of the model in cross - domain few - shot learning tasks in a source - free setting and has achieved better results than existing methods on multiple benchmark datasets.

Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting

HybridPrompt: Domain-Aware Prompting for Cross-Domain Few-Shot Learning

Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting

Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning

Ontology-enhanced Prompt-tuning for Few-shot Learning

Enhancing Vision-Language Models Generalization via Diversity-Driven Novel Feature Synthesis

Deeply Coupled Cross-Modal Prompt Learning

Semantic Prompt for Few-Shot Image Recognition

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Less is More: A Closer Look at Semantic-based Few-Shot Learning

Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation

Task-Adaptive Prompted Transformer for Cross-Domain Few-Shot Learning

Hierarchy-Aware Interactive Prompt Learning for Few-Shot Classification

Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning

SYNC-CLIP: Synthetic Data Make CLIP Generalize Better in Data-Limited Scenarios

Adaptive Semantic Consistency for Cross-domain Few-shot Classification

Learning to Prompt Your Domain for Vision-Language Models

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Fairness-guided Few-shot Prompting for Large Language Models