Abstract:Recently, the Segment Anything Model (SAM) has demonstrated promising segmentation capabilities in a variety of downstream segmentation tasks. However in the context of universal medical image segmentation there exists a notable performance discrepancy when directly applying SAM due to the domain gap between natural and 2D/3D medical data. In this work, we propose a dual-branch adapted SAM framework, named DB-SAM, that strives to effectively bridge this domain gap. Our dual-branch adapted SAM contains two branches in parallel: a ViT branch and a convolution branch. The ViT branch incorporates a learnable channel attention block after each frozen attention block, which captures domain-specific local features. On the other hand, the convolution branch employs a light-weight convolutional block to extract domain-specific shallow features from the input medical image. To perform cross-branch feature fusion, we design a bilateral cross-attention block and a ViT convolution fusion block, which dynamically combine diverse information of two branches for mask decoder. Extensive experiments on large-scale medical image dataset with various 3D and 2D medical segmentation tasks reveal the merits of our proposed contributions. On 21 3D medical image segmentation tasks, our proposed DB-SAM achieves an absolute gain of 8.8%, compared to a recent medical SAM adapter in the literature. The code and model are available at <a class="link-external link-https" href="https://github.com/AlfredQin/DB-SAM" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the performance degradation when the existing Segment Anything Model (SAM) is directly applied to 2D and 3D medical images in general medical image segmentation tasks. Specifically, due to the large domain gap between natural images and 2D/3D medical images, directly using SAM for medical image segmentation will lead to a significant reduction in segmentation quality. Therefore, the paper proposes a dual - branch adaptation framework, DB - SAM, which aims to effectively bridge this domain gap and improve the performance of SAM in medical image segmentation tasks. ### Main contributions of the paper 1. **Dual - branch framework**: DB - SAM contains two parallel branches - the ViT branch and the convolutional branch. The ViT branch captures domain - specific local features by inserting a learnable channel - attention block after each frozen attention block. The convolutional branch adopts lightweight convolutional blocks to extract domain - specific shallow features from the input medical images. 2. **Cross - branch feature fusion**: Designed bilateral cross - attention blocks and ViT - convolutional fusion blocks to dynamically combine the diverse information of the two branches for the mask decoder. 3. **Experimental verification**: Extensive experiments were carried out on large - scale medical image datasets, covering a variety of 3D and 2D medical segmentation tasks. The experimental results show that in 21 3D medical image segmentation tasks, DB - SAM has an absolute gain of 8.8% compared to the recent medical SAM adapters. ### Formula representation - **Channel - attention block**: \[ F_{\text{out}} = F_{\text{vit}}+\text{Conv}_{1\times1}(\text{SE}(\text{DWConv}_{3\times3}(\text{LN}(F_{\text{vit}})))) \] where \( F_{\text{vit}} \) represents the input embedding from the ViT attention block, \(\text{LN}\) represents layer normalization, \(\text{DWConv}_{3\times3}\) represents depth - wise convolution, \(\text{SE}\) represents the squeeze - and - excitation block, and \(\text{Conv}_{1\times1}\) represents point - wise convolution. - **Final fusion output**: \[ F_{\text{output}} = F_{o}^d\otimes M + F_{o}^s\otimes(1 - M) \] where \( F_{o}^d \) and \( F_{o}^s \) represent the features of the ViT branch and the convolutional branch respectively, \(\otimes\) represents element - wise multiplication, and \( M \) is a selective mask generated by the sigmoid function. ### Conclusion DB - SAM significantly improves the performance of SAM in medical image segmentation tasks by introducing a dual - branch framework and an effective feature fusion mechanism, especially when dealing with small organs and organs with complex shapes.

DB-SAM: Delving into High Quality Universal Medical Image Segmentation

Integrating Spatial Prior Adapter for Enhancing SAM Performance in Medical Image Segmentation

MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation

SAM-Med2D

Dr-SAM: U-Shape Structure Segment Anything Model for Generalizable Medical Image Segmentation

Interactive 3D Medical Image Segmentation with SAM 2

SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation

SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

Segment anything model 2: an application to 2D and 3D medical images

Segment Anything in Medical Images and Videos: Benchmark and Deployment

RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation

Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

SAM3D: Segment Anything Model in Volumetric Medical Images

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation

Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey

DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

I-MedSAM: Implicit Medical Image Segmentation with Segment Anything

$\mathrm{SAM^{Med}}$: A medical image annotation framework based on large vision model

CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation