Abstract:The Segment Anything Model (SAM), developed by Meta AI Research, represents a significant breakthrough in computer vision, offering a robust framework for image and video segmentation. This survey provides a comprehensive exploration of the SAM family, including SAM and SAM 2, highlighting their advancements in granularity and contextual understanding. Our study demonstrates SAM's versatility across a wide range of applications while identifying areas where improvements are needed, particularly in scenarios requiring high granularity and in the absence of explicit prompts. By mapping the evolution and capabilities of SAM models, we offer insights into their strengths and limitations and suggest future research directions, including domain-specific adaptations and enhanced memory and propagation mechanisms. We believe that this survey comprehensively covers the breadth of SAM's applications and challenges, setting the stage for ongoing advancements in segmentation technology.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the image and video segmentation problems in the field of computer vision. Specifically, the paper explores the Segment Anything Model (SAM) developed by Meta AI Research and its second - generation version SAM 2's significant breakthroughs in this area. These models achieve image and video segmentation by providing a powerful framework, especially making remarkable progress in fine - grained and context understanding. The main objective of the paper is to comprehensively explore the SAM family (including SAM and SAM 2), highlight their progress in fine - grained and context understanding, and demonstrate their versatility in various applications. Meanwhile, the paper also points out the areas where these models still need improvement in cases requiring high - level fine - grained and lacking clear prompts. ### Core Problems of the Paper 1. **Fine - grained Segmentation**: How to improve the fine - grained level of object recognition in images and videos, that is, how to segment various parts of an object in more detail, for example, decomposing a teapot into its lid, bowl and handle. 2. **Context Understanding**: How to enhance the model's ability to understand complex scenes, enabling AI systems to more accurately interpret the relationships and interactions of objects in images and videos. 3. **Real - time Processing**: How to handle the time dimension in dynamic visual data, especially maintaining high precision and low user - interaction requirements in real - time video processing. 4. **Application Areas**: How to apply these models to multiple fields, such as augmented reality, autonomous driving systems and medical imaging. ### Main Contributions of the Paper - **Technological Innovation**: Introduces the technical details of SAM and SAM 2, especially their innovations in fine - grained segmentation and context understanding. - **Wide Application**: Demonstrates the applications of these models in multiple fields such as image generation, restoration, annotation, matching and video generation, annotation, tracking. - **Future Directions**: Proposes directions for future research, including domain - specific adaptability, enhanced memory and propagation mechanisms. ### Structure of the Paper - **Introduction**: Introduces the development background and importance of SAM and its second - generation version SAM 2. - **Methods**: Describes the research methods of the paper, including the literature collection and screening process. - **Importance of Fine - grained Segmentation**: Discusses in detail the importance of fine - grained segmentation in computer vision tasks and the contributions of SAM and SAM 2 in this regard. - **Historical Technologies and Development**: Reviews the historical development and technological evolution of fine - grained segmentation. - **Fine - grained in Object Recognition**: Explores the role of fine - grained in object recognition from a theoretical perspective. - **SAM - driven Applications**: Introduces in detail the applications of SAM and SAM 2 in different fields, including image and video generation, restoration, annotation, matching. - **Conclusions and Future Directions**: Summarizes the achievements and limitations of the SAM family and proposes future research directions. Through these contents, the paper aims to provide readers with a comprehensive perspective to understand the latest progress of the SAM family in the field of computer vision and its potential application prospects.

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Principles, applications, and advancements of the Segment Anything Model

Segment Anything for Videos: A Systematic Survey

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

On Efficient Variants of Segment Anything Model: A Survey

Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes

Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey

From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model

AI-SAM: Automatic and Interactive Segment Anything Model

SAM 2: Segment Anything in Images and Videos

Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications

Segment Any Medical Model Extended

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

How Segment Anything Model (SAM) Boost Medical Image Segmentation?

Segment anything model for medical image segmentation: Current applications and future directions

Stable Segment Anything Model

Semantic-SAM: Segment and Recognize Anything at Any Granularity

Segment Anything with Multiple Modalities

The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot