When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation

Yuli Zhou,Guolei Sun,Yawei Li,Luca Benini,Ender Konukoglu
2024-09-27
Abstract:This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos, due to similar colors and textures, poor light conditions, etc. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. But its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2's ability in VCOS. First, we assess SAM2's performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has excellent zero-shot ability of detecting camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2's parameters for VCOS. The code will be available at <a class="link-external link-https" href="https://github.com/zhoustan/SAM2-VCOS" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore and evaluate the application and performance of Segment Anything Model 2 (SAM2) in the task of Video Camouflaged Object Segmentation (VCOS). Specifically, the paper mainly solves the following problems: 1. **Evaluating the zero - sample ability of SAM2**: - Research on the segmentation ability of SAM2 for camouflaged objects without additional training. - Analyze the detection effect of SAM2 on camouflaged objects in automatic and semi - supervised modes. 2. **Combining multimodal large language models (MLLMs) and existing VCOS methods**: - Explore the combination of SAM2 with existing MLLMs and VCOS methods to improve the accuracy of camouflaged object segmentation. - Design specific prompting strategies so that the bounding boxes generated by MLLMs can be used as input prompts for SAM2. 3. **Fine - tuning SAM2 for VCOS**: - Fine - tune SAM2 on the large - scale camouflaged object dataset MoCA - Mask to enhance its segmentation performance in dynamic camouflage scenarios. - Adjust the parameters of SAM2 to make it more suitable for handling the complex backgrounds and low - contrast environments of camouflaged objects. ### Main contributions - **The first comprehensive evaluation of SAM2's performance in VCOS**: The performance of SAM2 in automatic and semi - supervised modes has been verified through experiments. - **Proposing a new strategy for combining SAM2 and existing methods**: It has been shown that prompt - driven improvements can significantly improve segmentation accuracy. - **Enhancing SAM2's capabilities through fine - tuning**: Fine - tuning SAM2 on the MoCA - Mask dataset has achieved state - of - the - art segmentation results. ### Conclusion Through systematic experiments and analysis, the paper has proven the potential of SAM2 in the task of video camouflaged object segmentation and proposed methods for further improvement, providing a valuable reference for future research.