SU-SAM: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes

Yiran Song,Qianyu Zhou,Xuequan Lu,Zhiwen Shao,Lizhuang Ma
2024-07-29
Abstract:Segment anything model (SAM) has demonstrated excellent generalizability in common vision scenarios, yet falling short of the ability to understand specialized data. Recently, several methods have combined parameter-efficient techniques with task-specific designs to fine-tune SAM on particular tasks. However, these methods heavily rely on handcraft, complicated, and task-specific designs, and pre/post-processing to achieve acceptable performances on downstream tasks. As a result, this severely restricts generalizability to other downstream tasks. To address this issue, we present a simple and unified framework, namely SU-SAM, that can easily and efficiently fine-tune the SAM model with parameter-efficient techniques while maintaining excellent generalizability toward various downstream tasks. SU-SAM does not require any task-specific designs and aims to improve the adaptability of SAM-like models significantly toward underperformed scenes. Concretely, we abstract parameter-efficient modules of different methods into basic design elements in our framework. Besides, we propose four variants of SU-SAM, i.e., series, parallel, mixed, and LoRA structures. Comprehensive experiments on nine datasets and six downstream tasks to verify the effectiveness of SU-SAM, including medical image segmentation, camouflage object detection, salient object segmentation, surface defect segmentation, complex object shapes, and shadow masking. Our experimental results demonstrate that SU-SAM achieves competitive or superior accuracy compared to state-of-the-art methods. Furthermore, we provide in-depth analyses highlighting the effectiveness of different parameter-efficient designs within SU-SAM. In addition, we propose a generalized model and benchmark, showcasing SU-SAM's generalizability across all diverse datasets simultaneously.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Although the existing Segment Anything Model (SAM) performs excellently in common visual scenes, its ability to handle specific data or complex scenes is limited. Although some methods fine - tune SAM for specific tasks by combining parameter - efficient techniques with task - specific designs, these methods often rely on manual design, complex task - specific designs, and pre - processing and post - processing steps, resulting in limited generalization ability and being difficult to be applied to other downstream tasks. To this end, the author proposes a simple and unified framework - SU - SAM, which aims to fine - tune the SAM model easily and efficiently using parameter - efficient techniques while maintaining excellent generalization ability for various downstream tasks. SU - SAM does not rely on any task - specific design. Instead, it abstracts the parameter - efficient modules in different methods as basic design elements, proposes four structural variants (sequential, parallel, hybrid, and LoRA structures), and verifies their effectiveness on multiple datasets and tasks through extensive experiments.