Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

Shixuan Gao,Pingping Zhang,Tianyu Yan,Huchuan Lu

2024-08-08

Abstract:Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at <a class="link-external link-https" href="https://github.com/BellyBeauty/MDSAM" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Multimedia

What problem does this paper attempt to address?

The paper aims to address several key issues in the task of Salient Object Detection (SOD): 1. **Problems with existing methods**: Although current advanced SOD methods utilize various Convolutional Neural Networks (CNNs) or Transformers for deep feature extraction, they still exhibit low performance and poor generalization ability in complex scenes. 2. **Limitations of the Segment Anything Model (SAM)**: Despite SAM being a powerful vision foundation model with strong segmentation and generalization capabilities in image segmentation tasks, it has two main limitations in SOD applications: - It requires precise prompts for target objects (such as points, boxes, or rough masks), which are usually unavailable in SOD tasks. - It lacks the utilization of multi-scale and multi-level information and the fusion of fine-grained details. To address the above issues, the authors propose a new framework called "Multi-Scale and Detail-Enhanced Segment Anything Model (MDSAM)" to improve the performance of SOD tasks. Specifically, MDSAM addresses these issues through the following three modules: 1. **Lightweight Multi-Scale Adapter (LMSA)**: Allows SAM to learn multi-scale information while maintaining very few trainable parameters. 2. **Multi-Level Fusion Module (MLFM)**: Fully utilizes information from different levels of the SAM encoder to improve the model's perception of multi-scale information. 3. **Detail Enhancement Module (DEM)**: Improves SAM's segmentation results by integrating fine-grained details, thereby enhancing the accuracy of SOD. Through these modules, MDSAM can effectively identify and segment the most salient objects in images without relying on precise prompts and demonstrates superior performance on multiple SOD datasets.

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

SAM-Adapter: Adapting Segment Anything in Underperformed Scenes

MeSAM: Multiscale Enhanced Segment Anything Model for Optical Remote Sensing Images

Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection

Weakly supervised salient object detection via bounding-box annotation and SAM model

Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts

MAS-SAM: Segment Any Marine Animal with Aggregated Features

SAMNet: Stereoscopically Attentive Multi-Scale Network for Lightweight Salient Object Detection

Segment Anything with Multiple Modalities

MSRMNet: Multi-scale skip residual and multi-mixed features network for salient object detection

AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model

Customized Segment Anything Model for Medical Image Segmentation

SSFam: Scribble Supervised Salient Object Detection Family

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Boosting Segment Anything Model Towards Open-Vocabulary Learning

Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection

Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes

Semantic-SAM: Segment and Recognize Anything at Any Granularity

Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation