Abstract:As a foundational model, SAM has significantly influenced multiple fields within computer vision, and its upgraded version, SAM 2, enhances capabilities in video segmentation, poised to make a substantial impact once again. While SAMs (SAM and SAM 2) have demonstrated excellent performance in segmenting context-independent concepts like people, cars, and roads, they overlook more challenging context-dependent (CD) concepts, such as visual saliency, camouflage, product defects, and medical lesions. CD concepts rely heavily on global and local contextual information, making them susceptible to shifts in different contexts, which requires strong discriminative capabilities from the model. The lack of comprehensive evaluation of SAMs limits understanding of their performance boundaries, which may hinder the design of future models. In this paper, we conduct a thorough quantitative evaluation of SAMs on 11 CD concepts across 2D and 3D images and videos in various visual modalities within natural, medical, and industrial scenes. We develop a unified evaluation framework for SAM and SAM 2 that supports manual, automatic, and intermediate self-prompting, aided by our specific prompt generation and interaction strategies. We further explore the potential of SAM 2 for in-context learning and introduce prompt robustness testing to simulate real-world imperfect prompts. Finally, we analyze the benefits and limitations of SAMs in understanding CD concepts and discuss their future development in segmentation tasks. This work aims to provide valuable insights to guide future research in both context-independent and context-dependent concepts segmentation, potentially informing the development of the next version - SAM 3.

SAMEdge: An Edge-cloud Video Analytics Architecture for the Segment Anything Model

AyE-Edge: Automated Deployment Space Search Empowering Accuracy Yet Efficient Real-Time Object Detection on the Edge

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Edge-Based Video Analytics: A Survey

Edge-Cloud Collaborative Streaming Video Analytics with Multi-agent Deep Reinforcement Learning

AI-SAM: Automatic and Interactive Segment Anything Model

Crack-EdgeSAM Self-Prompting Crack Segmentation System for Edge Devices

Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

OsmoticGate: Adaptive Edge-Based Real-Time Video Analytics for the Internet of Things

Edge Video Analytics for Public Safety: A Review

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

CloudEye: A New Paradigm of Video Analysis System for Mobile Visual Scenarios

ECCVideo: A Scalable Edge Cloud Collaborative Video Analysis System

From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

EdgeVision: Towards Collaborative Video Analytics on Distributed Edges for Performance Maximization