Abstract:Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-powered prompting is under explored, e.g., automatically segmenting your pet dog in different images. In this paper, we propose a training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior, and segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement. In this way, we effectively adapt SAM for private use without any training. To further alleviate the mask ambiguity, we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing the entire SAM, we introduce two learnable weights for multi-scale masks, only training 2 parameters within 10 seconds for improved performance. To demonstrate our efficacy, we construct a new segmentation dataset, PerSeg, for personalized evaluation, and test our methods on video object segmentation with competitive performance. Besides, our approach can also enhance DreamBooth to personalize Stable Diffusion for text-to-image generation, which discards the background disturbance for better target appearance learning. Code is released at <a class="link-external link-https" href="https://github.com/ZrrSkywalker/Personalize-SAM" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The paper aims to address the problem of how to personalize a general segmentation model (such as the Segment Anything Model, SAM) to automatically segment specific visual concepts without additional training. Specifically, the paper proposes two methods: one is a training-free personalization method called PerSAM, and the other is an efficient fine-tuning method based on PerSAM called PerSAM-F. ### Main Research Questions 1. **Personalized Object Segmentation**: How to simply and efficiently customize a general segmentation model (such as SAM) to automatically segment user-specified visual concepts, such as specific objects like pet dogs or clocks. 2. **Improving Segmentation Accuracy**: Addressing the issue of segmentation scale ambiguity caused by different sub-parts or hierarchical structures within specific objects to improve segmentation accuracy. 3. **Enhancing Personalized Text-to-Image Synthesis**: Improving the quality of personalized text-to-image generation by reducing background interference. ### Solutions 1. **PerSAM**: A training-free personalization method is proposed, which can achieve personalized segmentation with a single data instance (one reference image and its mask). This method utilizes target-guided attention and target-semantic prompting to enhance segmentation performance. 2. **PerSAM-F**: To address the issue of segmentation scale ambiguity, an efficient fine-tuning variant called PerSAM-F is proposed. By adjusting only two parameters, fine-tuning is completed within 10 seconds, solving the selection problem in multi-scale segmentation and further improving segmentation performance. 3. **Assisting DreamBooth**: By assisting DreamBooth with PerSAM, background information interference during training is removed, resulting in higher quality personalized images. ### Experimental Validation - The paper constructs a new evaluation dataset called PerSeg and validates the effectiveness of the proposed methods on multiple benchmarks. - Experimental results show that PerSAM and PerSAM-F achieve significant results in personalized object segmentation tasks, especially when dealing with hierarchical objects. - Additionally, by assisting DreamBooth with PerSAM, the quality of personalized text-to-image synthesis is significantly improved.

Personalize Segment Anything Model with One Shot

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

Segment Anything in High Quality

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

SAMP: Adapting Segment Anything Model for Pose Estimation

Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

Stable Segment Anything Model

Part-aware Personalized Segment Anything Model for Patient-Specific Segmentation

Customized Segment Anything Model for Medical Image Segmentation

SAM-MPA: Applying SAM to Few-shot Medical Image Segmentation using Mask Propagation and Auto-prompting

Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation

CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

Segment Anything without Supervision

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation