Abstract:The primary challenge of cross-domain few-shot segmentation (CD-FSS) is the domain disparity between the training and inference phases, which can exist in either the input data or the target classes. Previous models struggle to learn feature representations that generalize to various unknown domains from limited training domain samples. In contrast, the large-scale visual model SAM, pre-trained on tens of millions of images from various domains and classes, possesses excellent generalizability. In this work, we propose a SAM-aware graph prompt reasoning network (GPRN) that fully leverages SAM to guide CD-FSS feature representation learning and improve prediction accuracy. Specifically, we propose a SAM-aware prompt initialization module (SPI) to transform the masks generated by SAM into visual prompts enriched with high-level semantic information. Since SAM tends to divide an object into many sub-regions, this may lead to visual prompts representing the same semantic object having inconsistent or fragmented features. We further propose a graph prompt reasoning (GPR) module that constructs a graph among visual prompts to reason about their interrelationships and enable each visual prompt to aggregate information from similar prompts, thus achieving global semantic consistency. Subsequently, each visual prompt embeds its semantic information into the corresponding mask region to assist in feature representation learning. To refine the segmentation mask during testing, we also design a non-parameter adaptive point selection module (APS) to select representative point prompts from query predictions and feed them back to SAM to refine inaccurate segmentation results. Experiments on four standard CD-FSS datasets demonstrate that our method establishes new state-of-the-art results. Code: <a class="link-external link-https" href="https://github.com/CVL-hub/GPRN" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the domain difference problem in Cross - Domain Few - Shot Segmentation (CD - FSS). Specifically, the challenge of CD - FSS lies in the domain differences between the training and inference stages, which may exist in the input data or target classes. Previous models have difficulty generalizing to various unknown domains when learning feature representations from limited training samples. To address this challenge, this paper proposes a Graph Prompt Reasoning Network (GPRN) based on SAM (Segment Anything Model) to make full use of SAM to guide the feature representation learning of CD - FSS and improve prediction accuracy. ### Problem Background 1. **Challenges of Cross - Domain Few - Shot Segmentation**: - The training and test datasets belong to different domains. - The model has difficulty learning feature representations from limited training samples that can be generalized to different domains. 2. **Limitations of Existing Methods**: - Existing methods perform poorly in dealing with cross - domain problems, especially in terms of feature representation ability and adaptation to new tasks. - Although Visual Prompt Tuning (VPT) is effective, it lacks prior semantic and spatial information at initialization, resulting in unsatisfactory feature adaptation effects. ### Solutions 1. **SAM - Aware Graph Prompt Reasoning Network (GPRN)**: - **SAM - aware Prompt Initialization (SPI)**: Convert the masks generated by SAM into visual prompts rich in high - level semantic information. - **Graph Prompt Reasoning (GPR)**: Construct a graph convolutional network (GCN) to reason about the relationships between visual prompts and ensure global semantic consistency. - **Adaptive Point Selection (APS)**: Select representative point prompts to improve the segmentation results and reduce background interference. 2. **Specific Contributions**: - Propose GPRN, which uses the visual prompts generated by SAM to assist feature representation learning, and reasons about the relationships between prompts through GCN to achieve global semantic consistency. - Introduce the APS module to select positive and negative point prompts to improve the initial prediction and segmentation accuracy. - Achieve the latest SOTA results on four standard CD - FSS datasets. ### Summary This paper solves the domain difference problem in cross - domain few - shot segmentation by introducing SAM and graph convolutional networks, significantly improving the generalization ability and segmentation accuracy of the model.

SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Sam-Rsp: A New Few-Shot Segmentation Method Based on Segment Anything Model and Rough Segmentation Prompts

Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation

SAM-SP: Self-Prompting Makes SAM Great Again

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation Based on Visual Foundation Model

AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

BioSAM: Generating SAM Prompts From Superpixel Graph for Biological Instance Segmentation

SAM-RSIS: Progressively Adapting SAM With Box Prompting to Remote Sensing Image Instance Segmentation

SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

UN-SAM: Universal Prompt-Free Segmentation for Generalized Nuclei Images

RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

Context-Aggregated and SAM-Guided Network for ViT-Based Instance Segmentation in Remote Sensing Images