SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation

Shi-Feng Peng,Guolei Sun,Yong Li,Hongsong Wang,Guo-Sen Xie
2024-12-31
Abstract:The primary challenge of cross-domain few-shot segmentation (CD-FSS) is the domain disparity between the training and inference phases, which can exist in either the input data or the target classes. Previous models struggle to learn feature representations that generalize to various unknown domains from limited training domain samples. In contrast, the large-scale visual model SAM, pre-trained on tens of millions of images from various domains and classes, possesses excellent generalizability. In this work, we propose a SAM-aware graph prompt reasoning network (GPRN) that fully leverages SAM to guide CD-FSS feature representation learning and improve prediction accuracy. Specifically, we propose a SAM-aware prompt initialization module (SPI) to transform the masks generated by SAM into visual prompts enriched with high-level semantic information. Since SAM tends to divide an object into many sub-regions, this may lead to visual prompts representing the same semantic object having inconsistent or fragmented features. We further propose a graph prompt reasoning (GPR) module that constructs a graph among visual prompts to reason about their interrelationships and enable each visual prompt to aggregate information from similar prompts, thus achieving global semantic consistency. Subsequently, each visual prompt embeds its semantic information into the corresponding mask region to assist in feature representation learning. To refine the segmentation mask during testing, we also design a non-parameter adaptive point selection module (APS) to select representative point prompts from query predictions and feed them back to SAM to refine inaccurate segmentation results. Experiments on four standard CD-FSS datasets demonstrate that our method establishes new state-of-the-art results. Code: <a class="link-external link-https" href="https://github.com/CVL-hub/GPRN" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the domain difference problem in Cross - Domain Few - Shot Segmentation (CD - FSS). Specifically, the challenge of CD - FSS lies in the domain differences between the training and inference stages, which may exist in the input data or target classes. Previous models have difficulty generalizing to various unknown domains when learning feature representations from limited training samples. To address this challenge, this paper proposes a Graph Prompt Reasoning Network (GPRN) based on SAM (Segment Anything Model) to make full use of SAM to guide the feature representation learning of CD - FSS and improve prediction accuracy. ### Problem Background 1. **Challenges of Cross - Domain Few - Shot Segmentation**: - The training and test datasets belong to different domains. - The model has difficulty learning feature representations from limited training samples that can be generalized to different domains. 2. **Limitations of Existing Methods**: - Existing methods perform poorly in dealing with cross - domain problems, especially in terms of feature representation ability and adaptation to new tasks. - Although Visual Prompt Tuning (VPT) is effective, it lacks prior semantic and spatial information at initialization, resulting in unsatisfactory feature adaptation effects. ### Solutions 1. **SAM - Aware Graph Prompt Reasoning Network (GPRN)**: - **SAM - aware Prompt Initialization (SPI)**: Convert the masks generated by SAM into visual prompts rich in high - level semantic information. - **Graph Prompt Reasoning (GPR)**: Construct a graph convolutional network (GCN) to reason about the relationships between visual prompts and ensure global semantic consistency. - **Adaptive Point Selection (APS)**: Select representative point prompts to improve the segmentation results and reduce background interference. 2. **Specific Contributions**: - Propose GPRN, which uses the visual prompts generated by SAM to assist feature representation learning, and reasons about the relationships between prompts through GCN to achieve global semantic consistency. - Introduce the APS module to select positive and negative point prompts to improve the initial prediction and segmentation accuracy. - Achieve the latest SOTA results on four standard CD - FSS datasets. ### Summary This paper solves the domain difference problem in cross - domain few - shot segmentation by introducing SAM and graph convolutional networks, significantly improving the generalization ability and segmentation accuracy of the model.