Abstract:Segment anything model (SAM) addresses two practical yet challenging segmentation tasks: \textbf{segment anything (SegAny)}, which utilizes a certain point to predict the mask for a single object of interest, and \textbf{segment everything (SegEvery)}, which predicts the masks for all objects on the image. What makes SegAny slow for SAM is its heavyweight image encoder, which has been addressed by MobileSAM via decoupled knowledge distillation. The efficiency bottleneck of SegEvery with SAM, however, lies in its mask decoder because it needs to first generate numerous masks with redundant grid-search prompts and then perform filtering to obtain the final valid masks. We propose to improve its efficiency by directly generating the final masks with only valid prompts, which can be obtained through object discovery. Our proposed approach not only helps reduce the total time on the mask decoder by at least 16 times but also achieves superior performance. Specifically, our approach yields an average performance boost of 3.6\% (42.5\% \textit{v.s.} 38.9\%) for zero-shot object proposal on the LVIS dataset with the mask AR@$K$ metric. Qualitative results show that our approach generates fine-grained masks while avoiding over-segmenting things. This project targeting faster SegEvery than the original SAM is termed MobileSAMv2 to differentiate from MobileSAM which targets faster SegAny. Moreover, we demonstrate that our new prompt sampling is also compatible with the distilled image encoders in MobileSAM, contributing to a unified framework for efficient SegAny and SegEvery. The code is available at the same link as MobileSAM Project \href{<a class="link-external link-https" href="https://github.com/ChaoningZhang/MobileSAM" rel="external noopener nofollow">this https URL</a>}{\textcolor{red}{<a class="link-external link-https" href="https://github.com/ChaoningZhang/MobileSAM" rel="external noopener nofollow">this https URL</a>}}. \end{abstract}

SqueezeSAM: User friendly mobile interactive segmentation

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

RepViT-SAM: Towards Real-Time Segmenting Anything

MobileSAMv2: Faster Segment Anything to Everything

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

SAM 2: Segment Anything in Images and Videos

SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

Efficient Track Anything

RobustSAM: Segment Anything Robustly on Degraded Images

SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

On Efficient Variants of Segment Anything Model: A Survey

SAMM (Segment Any Medical Model): A 3D Slicer Integration to SAM

FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images

Segment Anything with Multiple Modalities

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

Segment anything model 2: an application to 2D and 3D medical images

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything