TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation

Jiaqi Yang,Yaning Zhang,Jingxi Hu,Xiangjian He,Linlin Shen,Guoping Qiu
2024-12-28
Abstract:While large visual models (LVM) demonstrated significant potential in image understanding, due to the application of large-scale pre-training, the Segment Anything Model (SAM) has also achieved great success in the field of image segmentation, supporting flexible interactive cues and strong learning capabilities. However, SAM's performance often falls short in cross-domain and few-shot applications. Previous work has performed poorly in transferring prior knowledge from base models to new applications. To tackle this issue, we propose a task-adaptive auto-visual prompt framework, a new paradigm for Cross-dominan Few-shot segmentation (CD-FSS). First, a Multi-level Feature Fusion (MFF) was used for integrated feature extraction as prior knowledge. Besides, we incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain agnostic feature extraction and generate high-quality, learnable visual prompts. This significant advancement uses a unique generative approach to prompts alongside a comprehensive model structure and specialized prototype computation. While ensuring that the prior knowledge of SAM is not discarded, the new branch disentangles category and domain information through prototypes, guiding it in adapting the CD-FSS. Comprehensive experiments across four cross-domain datasets demonstrate that our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3\% in the 1-shot setting and 11.76\% in the 5-shot setting.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficient performance of existing large - scale visual models (such as the Segment Anything Model, SAM) in cross - domain and few - shot application scenarios in the Cross - domain Few - shot Segmentation (CD - FSS) task. Specifically: 1. **Poor cross - domain generalization ability**: Although SAM performs excellently in the field of natural image segmentation, its generalization ability is relatively poor in domain - specific tasks (such as medical image segmentation, remote sensing image segmentation, etc.). 2. **High - cost domain - specific adaptation**: When adapting SAM to a specific domain, a large amount of data collection, sample labeling and model training are required, which makes the method resource - intensive and impractical. 3. **Difficulty in covering all domains**: It is impossible to enumerate and solve all possible specific domain requirements, which limits the scalability of SAM - based methods in diverse and evolving domain - specific demand scenarios. 4. **Poor transfer learning strategies**: Although recent works have combined SAM with meta - learning for transfer learning, these methods mainly focus on fine - tuning the SAM encoder or implementing the teacher - student framework through knowledge distillation, and fail to provide a comprehensive solution to efficiently adapt to different domains. To solve the above problems, the author proposes a Task - Adaptive Visual Prompt framework (TA VP), aiming to improve the performance in the cross - domain few - shot segmentation task. This framework includes the following key technologies: - **Multi - level Feature Fusion (MFF)**: Used for comprehensive feature extraction as prior knowledge. - **Class Domain Task - Adaptive Auto - Prompt (CDTAP)**: Achieves category - and domain - independent feature extraction and generates high - quality, learnable visual prompts. Through these improvements, TA VP can effectively improve the performance of SAM in the cross - domain few - shot segmentation task while maintaining its original advantages. Experimental results show that the model has increased the average accuracy by 1.3% (in the 1 - shot setting) and 11.76% (in the 5 - shot setting) on four cross - domain datasets respectively.