Abstract:While large visual models (LVM) demonstrated significant potential in image understanding, due to the application of large-scale pre-training, the Segment Anything Model (SAM) has also achieved great success in the field of image segmentation, supporting flexible interactive cues and strong learning capabilities. However, SAM's performance often falls short in cross-domain and few-shot applications. Previous work has performed poorly in transferring prior knowledge from base models to new applications. To tackle this issue, we propose a task-adaptive auto-visual prompt framework, a new paradigm for Cross-dominan Few-shot segmentation (CD-FSS). First, a Multi-level Feature Fusion (MFF) was used for integrated feature extraction as prior knowledge. Besides, we incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain agnostic feature extraction and generate high-quality, learnable visual prompts. This significant advancement uses a unique generative approach to prompts alongside a comprehensive model structure and specialized prototype computation. While ensuring that the prior knowledge of SAM is not discarded, the new branch disentangles category and domain information through prototypes, guiding it in adapting the CD-FSS. Comprehensive experiments across four cross-domain datasets demonstrate that our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3\% in the 1-shot setting and 11.76\% in the 5-shot setting.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the insufficient performance of existing large - scale visual models (such as the Segment Anything Model, SAM) in cross - domain and few - shot application scenarios in the Cross - domain Few - shot Segmentation (CD - FSS) task. Specifically: 1. **Poor cross - domain generalization ability**: Although SAM performs excellently in the field of natural image segmentation, its generalization ability is relatively poor in domain - specific tasks (such as medical image segmentation, remote sensing image segmentation, etc.). 2. **High - cost domain - specific adaptation**: When adapting SAM to a specific domain, a large amount of data collection, sample labeling and model training are required, which makes the method resource - intensive and impractical. 3. **Difficulty in covering all domains**: It is impossible to enumerate and solve all possible specific domain requirements, which limits the scalability of SAM - based methods in diverse and evolving domain - specific demand scenarios. 4. **Poor transfer learning strategies**: Although recent works have combined SAM with meta - learning for transfer learning, these methods mainly focus on fine - tuning the SAM encoder or implementing the teacher - student framework through knowledge distillation, and fail to provide a comprehensive solution to efficiently adapt to different domains. To solve the above problems, the author proposes a Task - Adaptive Visual Prompt framework (TA VP), aiming to improve the performance in the cross - domain few - shot segmentation task. This framework includes the following key technologies: - **Multi - level Feature Fusion (MFF)**: Used for comprehensive feature extraction as prior knowledge. - **Class Domain Task - Adaptive Auto - Prompt (CDTAP)**: Achieves category - and domain - independent feature extraction and generates high - quality, learnable visual prompts. Through these improvements, TA VP can effectively improve the performance of SAM in the cross - domain few - shot segmentation task while maintaining its original advantages. Experimental results show that the model has increased the average accuracy by 1.3% (in the 1 - shot setting) and 11.76% (in the 5 - shot setting) on four cross - domain datasets respectively.

TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Sam-Rsp: A New Few-Shot Segmentation Method Based on Segment Anything Model and Rough Segmentation Prompts

Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models

MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation

SAM-SP: Self-Prompting Makes SAM Great Again

Explicit Visual Prompting for Universal Foreground Segmentations

Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation

Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation

Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals

X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation