SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

Sihan Yang,Haixia Bi,Hai Zhang,Jian Sun

2024-08-19

Abstract:Segment Anything Model (SAM) has demonstrated impressive performance on a wide range of natural image segmentation tasks. However, its performance significantly deteriorates when directly applied to medical domain, due to the remarkable differences between natural images and medical images. Some researchers have attempted to train SAM on large scale medical datasets. However, poor zero-shot performance is observed from the experimental results. In this context, inspired by the superior performance of U-Net-like models in medical image segmentation, we propose SAMUNet, a new foundation model which incorporates U-Net to the original SAM, to fully leverage the powerful contextual modeling ability of convolutions. To be specific, we parallel a convolutional branch in the image encoder, which is trained independently with the vision Transformer branch frozen. Additionally, we employ multi-scale fusion in the mask decoder, to facilitate accurate segmentation of objects with different scales. We train SAM-UNet on SA-Med2D-16M, the largest 2-dimensional medical image segmentation dataset to date, yielding a universal pretrained model for medical images. Extensive experiments are conducted to evaluate the performance of the model, and state-of-the-art result is achieved, with a dice similarity coefficient score of 0.883 on SA-Med2D-16M dataset. Specifically, in zero-shot segmentation experiments, our model not only significantly outperforms previous large medical SAM models across all modalities, but also substantially mitigates the performance degradation seen on unseen modalities. It should be highlighted that SAM-UNet is an efficient and extensible foundation model, which can be further fine-tuned for other downstream tasks in medical community. The code is available at <a class="link-external link-https" href="https://github.com/Hhankyangg/sam-unet" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in medical image segmentation tasks, the existing Segment Anything Model (SAM) has poor performance in zero - shot scenarios. Specifically, although SAM performs excellently in natural image segmentation tasks, when directly applied to medical images, its performance will decline significantly due to the significant differences between natural images and medical images. Moreover, although some researchers have attempted to improve its performance by training SAM on large - scale medical datasets, these methods still cannot achieve good zero - shot segmentation effects on unseen modalities (such as microscope, pathology, and X - ray images). To address this challenge, the paper proposes SAM - UNet, a new basic model. By integrating the U - Net structure into the original SAM, it can fully utilize the powerful ability of convolutional networks in local information modeling. Specific improvements include: 1. **Dual - branch image encoder**: A parallel convolutional neural network (CNN) branch is introduced in the image encoder, while the original visual Transformer (ViT) branch is kept frozen to retain SAM's encoding ability for natural images. 2. **Multi - scale fusion mask decoder**: A multi - scale fusion strategy is adopted in the mask decoder to improve the segmentation accuracy of objects at different scales. 3. **New output token design**: The Med - Output Token is introduced to replace the original IoU prediction token and multiple mask tokens, so as to reduce ambiguity and improve efficiency. Through these improvements, SAM - UNet achieves state - of - the - art zero - shot segmentation performance on multiple medical image modalities, especially on unseen modalities. The paper also shows the training results of SAM - UNet on the large - scale medical image dataset SA - Med2D - 16M and verifies its effectiveness and robustness through extensive experiments.

SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance

Dr-SAM: U-Shape Structure Segment Anything Model for Generalizable Medical Image Segmentation

SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

Plug‐and‐play segment anything model improves nnUNet performance

SAM-Med2D

DB-SAM: Delving into High Quality Universal Medical Image Segmentation

MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation

SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation

Accuracy of Segment-Anything Model (SAM) in medical image segmentation tasks

Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model

$\mathrm{SAM^{Med}}$: A medical image annotation framework based on large vision model

UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets

Empirical Evaluation of the Segment Anything Model (SAM) for Brain Tumor Segmentation

SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

No More Training: SAM's Zero-Shot Transfer Capabilities for Cost-Efficient Medical Image Segmentation

SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images

SAM-IE: SAM-based Image Enhancement for Facilitating Medical Image Diagnosis with Segmentation Foundation Model

Customized Segment Anything Model for Medical Image Segmentation