TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

Yang Yu,Chen Xu,Kai Wang
2024-08-04
Abstract:Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks. However, there is still a significant performance gap between fine-tuned SAMs and domain-specific models. To reduce the gap, we propose Two-Stream SAM (TS-SAM). On the one hand, inspired by the side network in Parameter-Efficient Fine-Tuning (PEFT), we designed a lightweight Convolutional Side Adapter (CSA), which integrates the powerful features from SAM into side network training for comprehensive feature fusion. On the other hand, in line with the characteristics of segmentation tasks, we designed Multi-scale Refinement Module (MRM) and Feature Fusion Decoder (FFD) to keep both the detailed and semantic features. Extensive experiments on ten public datasets from three tasks demonstrate that TS-SAM not only significantly outperforms the recently proposed SAM-Adapter and SSOM, but achieves competitive performance with the SOTA domain-specific models. Our code is available at: <a class="link-external link-https" href="https://github.com/maoyangou147/TS-SAM" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main goal of this paper is to improve the performance of the Segment-Anything Model (SAM) in downstream tasks, particularly for three challenging downstream tasks: Camouflaged Object Detection (COD), Shadow Detection, and Salient Object Detection (SOD). To address the poor performance of SAM in these tasks, the authors propose the Two-Stream SAM (TS-SAM) method. The key contributions of TS-SAM include: 1. **Introducing side networks into SAM fine-tuning for the first time**: By designing a lightweight Convolutional Side Adapter (CSA), it can effectively extract features from the SAM encoder and adapt them to different downstream tasks. 2. **Multi-Scale Refinement Module (MRM) and Feature Fusion Decoder (FFD) tailored for segmentation tasks**: These modules can capture detailed features in high-resolution images and fully integrate these features during decoding, resulting in more precise segmentation results. 3. **Extensive experimental validation**: Experiments on 10 public datasets show that TS-SAM not only significantly outperforms recently proposed methods such as SAM-Adapter and SSOM but also competes in performance with state-of-the-art domain-specific models designed for each task. Through these technical means, TS-SAM can effectively enhance the performance of SAM in various downstream tasks while maintaining a lightweight nature.