Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals

Hussni Mohd Zakir,Eric Tatt Wei Ho
2024-11-21
Abstract:The Segment-Anything Model (SAM) is a vision foundation model for segmentation with a prompt-driven framework. SAM generates class-agnostic masks based on user-specified instance-referring prompts. However, adapting SAM for automated segmentation -- where manual input is absent -- of specific object classes often requires additional model training. We present Segment Any Class (SAC), a novel, training-free approach that task-adapts SAM for Multi-class segmentation. SAC generates Class-Region Proposals (CRP) on query images which allows us to automatically generate class-aware prompts on probable locations of class instances. CRPs are derived from elementary intra-class and inter-class feature distinctions without any additional training. Our method is versatile, accommodating any N-way K-shot configurations for the multi-class few-shot semantic segmentation (FSS) task. Unlike gradient-learning adaptation of generalist models which risk the loss of generalization and potentially suffer from catastrophic forgetting, SAC solely utilizes automated prompting and achieves superior results over state-of-the-art methods on the COCO-20i benchmark, particularly excelling in high N-way class scenarios. SAC is an interesting demonstration of a prompt-only approach to adapting foundation models for novel tasks with small, limited datasets without any modifications to the foundation model itself. This method offers interesting benefits such as intrinsic immunity to concept or feature loss and rapid, online task adaptation of foundation models.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to make a base model (such as Segment - Anything Model, SAM) adapt to multi - class few - shot semantic segmentation tasks without additional training. Specifically: 1. **Problem Background**: - Although base models (such as SAM) can generate category - independent masks according to user - specified instance reference prompts, additional model training is usually required when automatically segmenting specific object categories. - For multi - class few - shot semantic segmentation tasks, directly updating model weights may lead to catastrophic forgetting, that is, the learning of new tasks will overwrite the previously learned knowledge. 2. **Research Objectives**: - Propose a new method without additional training to enable the base model to automatically adapt to multi - class segmentation tasks. - Automatically generate Class - Region Proposals (CRP) to achieve the automatic prompt generation for possible class instance locations. - Verify the performance of this method on the COCO - 20i benchmark, especially in high N - way (multi - class) scenarios. 3. **Solutions**: - Propose the Segment Any Class (SAC) method, which realizes multi - class few - shot semantic segmentation without gradient learning by leveraging the base models of DINOv2 and SAM. - SAC works through the following steps: 1. **Support Feature Extraction**: Use DINOv2 to extract features from support images and construct a Category - Representative Feature Array (CRFA). 2. **Mask Prediction and Class Region Proposals**: In the inference stage, generate Class - Region Proposals (CRP) by calculating the similarity map between the query image and CRFA, and further generate a prompt set for SAM for segmentation. 4. **Innovations**: - SAC completely avoids gradient learning, ensuring that no catastrophic forgetting occurs. - By automatically generating prompt sets, SAC can quickly adapt to new tasks on small datasets. - It performs well in multi - class few - shot segmentation tasks, especially in high N - way scenarios. In conclusion, this paper aims to solve the problem of how to make the base model adapt to multi - class few - shot semantic segmentation tasks without modifying the base model weights, and proposes a new method based on automatic prompt generation.