Abstract:We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at <a class="link-external link-https" href="https://segment-anything.com" rel="external noopener nofollow">this https URL</a> to foster research into foundation models for computer vision.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to build a foundation model in the field of image segmentation, enabling it to perform zero - shot transfer through prompts to adapt to new data distributions and tasks. Specifically, the paper proposes three inter - related components to achieve this goal: 1. **Promptable Segmentation Task**: This is a general task aimed at generating valid segmentation masks through any type of prompt (such as points, boxes, text, etc.). Even if the prompt is ambiguous, the model should be able to generate a reasonable mask. 2. **Segment Anything Model (SAM)**: This is a model designed to support flexible prompts and is able to generate segmentation masks in real - time or near - real - time. The model consists of an image encoder, a prompt encoder, and a lightweight mask decoder, can handle multiple types of prompts, and can generate multiple possible masks when faced with ambiguous prompts. 3. **Data Engine**: Since the scale of existing segmentation datasets is limited, the authors developed a data engine. Through three stages of model - assisted manual annotation, semi - automatic annotation, and fully - automatic annotation, more than 1 billion segmentation masks were collected, and the largest segmentation dataset SA - 1B so far was constructed. Through these three components, the paper aims to build a powerful segmentation foundation model. This model can not only perform well on training data, but also perform zero - shot transfer on unseen data distributions and tasks through prompt engineering, thereby solving a wide range of downstream segmentation problems.

Segment Anything

SAM 2: Segment Anything in Images and Videos

SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

Segment Anything in High Quality

Segment anything, from space?

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

Semantic-SAM: Segment and Recognize Anything at Any Granularity

The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

Segment Anything without Supervision

Segment Anything in Medical Images

Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications

Segment anything model 2: an application to 2D and 3D medical images

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

Segment Any Object Model (SAOM): Real-to-Simulation Fine-Tuning Strategy for Multi-Class Multi-Instance Segmentation

SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model