Bridge the Points: Graph-based Few-shot Segment Anything Semantically

Anqi Zhang,Guangyu Gao,Jianbo Jiao,Chi Harold Liu,Yunchao Wei
2024-10-11
Abstract:The recent advancements in large-scale pre-training techniques have significantly enhanced the capabilities of vision foundation models, notably the Segment Anything Model (SAM), which can generate precise masks based on point and box prompts. Recent studies extend SAM to Few-shot Semantic Segmentation (FSS), focusing on prompt generation for SAM-based automatic semantic segmentation. However, these methods struggle with selecting suitable prompts, require specific hyperparameter settings for different scenarios, and experience prolonged one-shot inference times due to the overuse of SAM, resulting in low efficiency and limited automation ability. To address these issues, we propose a simple yet effective approach based on graph analysis. In particular, a Positive-Negative Alignment module dynamically selects the point prompts for generating masks, especially uncovering the potential of the background context as the negative reference. Another subsequent Point-Mask Clustering module aligns the granularity of masks and selected points as a directed graph, based on mask coverage over points. These points are then aggregated by decomposing the weakly connected components of the directed graph in an efficient manner, constructing distinct natural clusters. Finally, the positive and overshooting gating, benefiting from graph-based granularity alignment, aggregate high-confident masks and filter out the false-positive masks for final prediction, reducing the usage of additional hyperparameters and redundant mask generation. Extensive experimental analysis across standard FSS, One-shot Part Segmentation, and Cross Domain FSS datasets validate the effectiveness and efficiency of the proposed approach, surpassing state-of-the-art generalist models with a mIoU of 58.7% on COCO-20i and 35.2% on LVIS-92i. The code is available in <a class="link-external link-https" href="https://andyzaq.github.io/GF-SAM/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the following problems: 1. **Challenges in automated semantic segmentation**: - **Difficulty in point - prompt selection**: Existing methods face challenges in selecting suitable point - prompts, resulting in generated masks that are not precise enough or do not have full coverage. - **Dependence on hyperparameters**: Different scenarios require specific hyperparameter settings, increasing complexity and the need for manual adjustment. - **Long inference time**: Due to over - reliance on SAM (Segment Anything Model), existing methods are time - consuming in a single inference, are inefficient and have limited automation capabilities. 2. **The relationship between fine - grained and coarse - grained features**: - Existing methods ignore the potential relationship between points (from fine - grained features) and masks (generated in a coarse - grained way by SAM), resulting in inefficiency and limited automation capabilities. 3. **Improving the efficiency and accuracy of few - shot semantic segmentation (FSS)**: - The paper proposes a new method based on graph analysis and representation learning, aiming to improve the efficiency and accuracy of FSS tasks through dynamic point - prompt selection, aligning the granularity of points and masks, and efficient clustering. ### Overview of the solution To address the above challenges, the paper proposes the following solutions: - **Positive - Negative Alignment (PNA) module**: - Dynamically select point - prompts, use foreground and background reference images for positive - negative alignment, and especially emphasize the importance of the background context as a negative reference. - **Point - Mask Clustering (PMC) module**: - Model the relationship between points and masks as a directed graph, and perform clustering based on the situation where masks cover points, thereby bridging fine - grained and coarse - grained features. - **Post - Gating strategy**: - Include Positive Gating and Overshooting Gating, which are used to filter and merge mismatched masks, reduce redundancy and false positives, and further improve the accuracy and reliability of the final prediction. These methods work together, making the model perform excellently on multiple standard FSS datasets, single - shot partial segmentation datasets, and cross - domain FSS datasets, significantly outperforming existing methods. ### Experimental results The experimental results show that this method not only achieves excellent performance on standard FSS datasets (such as Pascal - 5i, COCO - 20i, FSS - 1000, LVIS - 92i), but also shows strong generalization ability and adaptability on more challenging single - shot partial segmentation datasets (such as PASCAL - Part, PACO - Part) and cross - domain FSS datasets (such as Deepglobe, ISIC, iSAID - 5i).