Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

Zhi Cai,Yingjie Gao,Yaoyan Zheng,Nan Zhou,Di Huang

2024-07-19

Abstract:In computer vision, object detection is an important task that finds its application in many scenarios. However, obtaining extensive labels can be challenging, especially in crowded scenes. Recently, the Segment Anything Model (SAM) has been proposed as a powerful zero-shot segmenter, offering a novel approach to instance segmentation tasks. However, the accuracy and efficiency of SAM and its variants are often compromised when handling objects in crowded and occluded scenes. In this paper, we introduce Crowd-SAM, a SAM-based framework designed to enhance SAM's performance in crowded and occluded scenes with the cost of few learnable parameters and minimal labeled images. We introduce an efficient prompt sampler (EPS) and a part-whole discrimination network (PWD-Net), enhancing mask selection and accuracy in crowded scenes. Despite its simplicity, Crowd-SAM rivals state-of-the-art (SOTA) fully-supervised object detection methods on several benchmarks including CrowdHuman and CityPersons. Our code is available at <a class="link-external link-https" href="https://github.com/FelixCaae/CrowdSAM" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced in object detection in crowded scenes, especially when objects are dense and occluded. Traditional object detection methods often require a large amount of labeled data for training, which is not only time - consuming but also costly. Specifically, the paper focuses on how to improve the accuracy and efficiency of object detection in crowded scenes without a large amount of labeled data. To address this challenge, the authors propose the Crowd - SAM framework, a method based on the Segment Anything Model (SAM). It aims to enhance the performance of SAM in crowded scenes with a small number of learnable parameters and the least number of labeled images by introducing the Efficient Prompt Sampler (EPS) and the Part - Whole Discrimination Network (PWD - Net). These components are helpful for mask selection and improving accuracy in crowded scenes. The main contributions of Crowd - SAM include: 1. Proposing Crowd - SAM, a self - prompted segmentation method for marking images containing clustered objects, which can produce accurate results with only a few examples. 2. Designing two new components of Crowd - SAM, namely EPS and PWD - Net, which effectively unleash the capabilities of SAM in crowded scenes. 3. Conducting comprehensive experiments on two benchmarks, demonstrating the effectiveness and generalization ability of Crowd - SAM. Through these innovations, Crowd - SAM can show performance comparable to fully - supervised object detection methods in multiple public benchmark tests while maintaining simplicity and fast training, especially on benchmarks such as CrowdHuman and CityPersons.

Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

EM-SAM: Eye-Movement-Guided Segment Anything Model for Object Detection and Recognition in Complex Scenes

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

A Self-Training Approach for Point-Supervised Object Detection and Counting in Crowds

SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation

Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection

PosSAM: Panoptic Open-vocabulary Segment Anything

PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images

Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2

SAMNet: Stereoscopically Attentive Multi-Scale Network for Lightweight Salient Object Detection

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

MeSAM: Multiscale Enhanced Segment Anything Model for Optical Remote Sensing Images

Can SAM Count Anything? An Empirical Study on SAM Counting

Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization

Boosting Segment Anything Model Towards Open-Vocabulary Learning