Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection

Zhenni Yu,Xiaoqin Zhang,Li Zhao,Yi Bin,Guobao Xiao
2024-07-17
Abstract:This paper introduces a new Segment Anything Model with Depth Perception (DSAM) for Camouflaged Object Detection (COD). DSAM exploits the zero-shot capability of SAM to realize precise segmentation in the RGB-D domain. It consists of the Prompt-Deeper Module and the Finer Module. The Prompt-Deeper Module utilizes knowledge distillation and the Bias Correction Module to achieve the interaction between RGB features and depth features, especially using depth features to correct erroneous parts in RGB features. Then, the interacted features are combined with the box prompt in SAM to create a prompt with depth perception. The Finer Module explores the possibility of accurately segmenting highly camouflaged targets from a depth perspective. It uncovers depth cues in areas missed by SAM through mask reversion, self-filtering, and self-attention operations, compensating for its defects in the COD domain. DSAM represents the first step towards the SAM-based RGB-D COD model. It maximizes the utilization of depth features while synergizing with RGB features to achieve multimodal complementarity, thereby overcoming the segmentation limitations of SAM and improving its accuracy in COD. Experimental results on COD benchmarks demonstrate that DSAM achieves excellent segmentation performance and reaches the state-of-the-art (SOTA) on COD benchmarks with less consumption of training resources. The code will be available at <a class="link-external link-https" href="https://github.com/guobaoxiao/DSAM" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in Camouflaged Object Detection (COD), the existing Segment Anything Model (SAM) has poor segmentation performance in camouflaged areas due to the high similarity between camouflaged objects and the background. Specifically, SAM mainly performs segmentation based on RGB images. When dealing with highly camouflaged objects, it cannot effectively extract semantic and structural information, thus affecting the accuracy of segmentation. To solve this problem, the author proposes a new model based on SAM - the Segment Anything Model with Depth - awareness (DSAM), aiming to improve the segmentation performance of highly camouflaged objects by introducing depth information. DSAM achieves this goal through two modules: 1. **Prompt - Deeper Module (PDM)**: This module utilizes knowledge distillation and the Bias Correction Module (BCM) to realize the interaction between RGB features and depth features, especially using depth features to correct the wrong parts in RGB features. Then, these interacted features are combined with the box prompt in SAM to generate prompts with depth - awareness. 2. **Finer Module (FM)**: This module explores the possibility of accurately segmenting highly camouflaged objects from a depth perspective. Through mask reversion, self - filtering, and self - attention operations, FM can discover the depth cues missed by SAM and compensate for its defects in the COD field. Through the synergy of these two modules, DSAM maximally utilizes depth features and is complementary to RGB features, thereby overcoming the segmentation limitations of SAM in the COD field and improving its accuracy in COD tasks. Experimental results show that DSAM has achieved excellent segmentation performance on multiple COD benchmark datasets and has reached the State - of - the - Art (SOTA) level, while consuming fewer training resources.