SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

Sihan Yang,Haixia Bi,Hai Zhang,Jian Sun
2024-08-19
Abstract:Segment Anything Model (SAM) has demonstrated impressive performance on a wide range of natural image segmentation tasks. However, its performance significantly deteriorates when directly applied to medical domain, due to the remarkable differences between natural images and medical images. Some researchers have attempted to train SAM on large scale medical datasets. However, poor zero-shot performance is observed from the experimental results. In this context, inspired by the superior performance of U-Net-like models in medical image segmentation, we propose SAMUNet, a new foundation model which incorporates U-Net to the original SAM, to fully leverage the powerful contextual modeling ability of convolutions. To be specific, we parallel a convolutional branch in the image encoder, which is trained independently with the vision Transformer branch frozen. Additionally, we employ multi-scale fusion in the mask decoder, to facilitate accurate segmentation of objects with different scales. We train SAM-UNet on SA-Med2D-16M, the largest 2-dimensional medical image segmentation dataset to date, yielding a universal pretrained model for medical images. Extensive experiments are conducted to evaluate the performance of the model, and state-of-the-art result is achieved, with a dice similarity coefficient score of 0.883 on SA-Med2D-16M dataset. Specifically, in zero-shot segmentation experiments, our model not only significantly outperforms previous large medical SAM models across all modalities, but also substantially mitigates the performance degradation seen on unseen modalities. It should be highlighted that SAM-UNet is an efficient and extensible foundation model, which can be further fine-tuned for other downstream tasks in medical community. The code is available at <a class="link-external link-https" href="https://github.com/Hhankyangg/sam-unet" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in medical image segmentation tasks, the existing Segment Anything Model (SAM) has poor performance in zero - shot scenarios. Specifically, although SAM performs excellently in natural image segmentation tasks, when directly applied to medical images, its performance will decline significantly due to the significant differences between natural images and medical images. Moreover, although some researchers have attempted to improve its performance by training SAM on large - scale medical datasets, these methods still cannot achieve good zero - shot segmentation effects on unseen modalities (such as microscope, pathology, and X - ray images). To address this challenge, the paper proposes SAM - UNet, a new basic model. By integrating the U - Net structure into the original SAM, it can fully utilize the powerful ability of convolutional networks in local information modeling. Specific improvements include: 1. **Dual - branch image encoder**: A parallel convolutional neural network (CNN) branch is introduced in the image encoder, while the original visual Transformer (ViT) branch is kept frozen to retain SAM's encoding ability for natural images. 2. **Multi - scale fusion mask decoder**: A multi - scale fusion strategy is adopted in the mask decoder to improve the segmentation accuracy of objects at different scales. 3. **New output token design**: The Med - Output Token is introduced to replace the original IoU prediction token and multiple mask tokens, so as to reduce ambiguity and improve efficiency. Through these improvements, SAM - UNet achieves state - of - the - art zero - shot segmentation performance on multiple medical image modalities, especially on unseen modalities. The paper also shows the training results of SAM - UNet on the large - scale medical image dataset SA - Med2D - 16M and verifies its effectiveness and robustness through extensive experiments.