SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

Xinyu Xiong,Zihuang Wu,Shuangyi Tan,Wenxue Li,Feilong Tang,Ying Chen,Siying Li,Jie Ma,Guanbin Li
2024-08-17
Abstract:Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation. Specifically, SAM2-UNet adopts the Hiera backbone of SAM2 as the encoder, while the decoder uses the classic U-shaped design. Additionally, adapters are inserted into the encoder to allow parameter-efficient fine-tuning. Preliminary experiments on various downstream tasks, such as camouflaged object detection, salient object detection, marine animal segmentation, mirror detection, and polyp segmentation, demonstrate that our SAM2-UNet can simply beat existing specialized state-of-the-art methods without bells and whistles. Project page: \url{<a class="link-external link-https" href="https://github.com/WZH0120/SAM2-UNet" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address multiple challenging tasks in the field of image segmentation, including camouflaged object detection, salient object detection, marine animal segmentation, mirror detection, and polyp segmentation. Specifically, the paper proposes a new framework called SAM2-UNet, which combines the hierarchical backbone Hiera of the Segment Anything Model 2 (SAM2) with the classic U-shaped network design. In this way, SAM2-UNet is able to achieve excellent performance in various image segmentation tasks and has advantages in parameter-efficient fine-tuning. Experimental results show that SAM2-UNet outperforms existing specialized methods in 5 challenging benchmarks, demonstrating its potential as a powerful tool for natural and medical image segmentation.