Abstract:Segment Anything Model (SAM) has gained significant recognition in the field of semantic segmentation due to its versatile capabilities and impressive performance. Despite its success, SAM faces two primary limitations: (1) it relies heavily on meticulous human-provided prompts like key points, bounding boxes or text messages, which is labor-intensive; (2) the mask decoder's feature representation is sometimes inaccurate, as it solely employs dot product operations at the end of mask decoder, which inadequately captures the necessary correlations for precise segmentation. Current solutions to these problems such as fine-tuning SAM often require retraining a large number of parameters, which needs huge amount of time and computing resources. To address these limitations, we propose an automated prompting and mask calibration method called AM-SAM based on a bi-level optimization framework. Our approach automatically generates prompts for an input image, eliminating the need for human involvement with a good performance in early training epochs, achieving faster convergence. Additionally, we freeze the main part of SAM, and modify the mask decoder with Low-Rank Adaptation (LoRA), enhancing the mask decoder's feature representation by incorporating advanced techniques that go beyond simple dot product operations to more accurately capture and utilize feature correlations. Our experimental results demonstrate that AM-SAM achieves significantly accurate segmentation, matching or exceeding the effectiveness of human-generated and default prompts. Notably, on the body segmentation dataset, our method yields a 5% higher dice score with a 4-example few-shot training set compared to the SOTA method, underscoring its superiority in semantic segmentation tasks.

Semantic-Enhanced Point-Box Joint Prompting for Video Object Segmentation

SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising

Training-Free Robust Interactive Video Object Segmentation

Sam-Rsp: A New Few-Shot Segmentation Method Based on Segment Anything Model and Rough Segmentation Prompts

Segment Anything Meets Point Tracking

SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation

SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

Robust Box Prompt based SAM for Medical Image Segmentation

Improved Intelligent Scissors and Snake based VOP Interpolation for Semiautomatic Video Object Segmentation

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track

AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model

SAM-SP: Self-Prompting Makes SAM Great Again

Learning to Prompt Segment Anything Models

Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

AI-SAM: Automatic and Interactive Segment Anything Model

Video Object Segmentation with Dynamic Query Modulation

Learning Spatial-Semantic Features for Robust Video Object Segmentation

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning