Abstract:Segment Anything Model~(SAM), a prompt-driven foundation model for natural image segmentation, has demonstrated impressive zero-shot performance. However, SAM does not work when directly applied to medical image segmentation tasks, since SAM lacks the functionality to predict semantic labels for predicted masks and needs to provide extra prompts, such as points or boxes, to segment target regions. Meanwhile, there is a huge gap between 2D natural images and 3D medical images, so the performance of SAM is imperfect for medical image segmentation tasks. Following the above issues, we propose MaskSAM, a novel mask classification prompt-free SAM adaptation framework for medical image segmentation. We design a prompt generator combined with the image encoder in SAM to generate a set of auxiliary classifier tokens, auxiliary binary masks, and auxiliary bounding boxes. Each pair of auxiliary mask and box prompts, which can solve the requirements of extra prompts, is associated with class label predictions by the sum of the auxiliary classifier token and the learnable global classifier tokens in the mask decoder of SAM to solve the predictions of semantic labels. Meanwhile, we design a 3D depth-convolution adapter for image embeddings and a 3D depth-MLP adapter for prompt embeddings. We inject one of them into each transformer block in the image encoder and mask decoder to enable pre-trained 2D SAM models to extract 3D information and adapt to 3D medical images. Our method achieves state-of-the-art performance on AMOS2022, 90.52% Dice, which improved by 2.7% compared to nnUNet. Our method surpasses nnUNet by 1.7% on ACDC and 1.0% on Synapse datasets.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced in extending the Segment Anything Model (SAM) from natural image segmentation tasks to medical image segmentation tasks. Specifically, when SAM is directly applied to medical image segmentation, there are the following main problems: 1. **Lack of semantic label prediction function**: The binary masks generated by SAM do not contain any semantic labels, while medical image segmentation tasks usually involve multiple objects with different semantic labels. 2. **Requirement for additional prompts**: SAM requires users to provide precise prompts (such as points or boxes) to segment the target area, which may be difficult to achieve in practical applications, especially without medical knowledge. 3. **Insufficient support for 3D medical images**: SAM is mainly used to process 2D natural images, while many medical scan data are 3D volume data (such as MRI and CT), so its performance is not good when processing 3D medical images. To overcome these problems, the paper proposes MaskSAM, a prompt - free SAM adaptation framework specifically for medical image segmentation. The main contributions of MaskSAM include: 1. **Proposing a prompt - free architecture**: By designing a prompt generator, it automatically generates auxiliary binary masks and bounding boxes as prompts, eliminating the need for manual prompting. 2. **Introducing auxiliary classifier tokens**: The prompt generator simultaneously generates auxiliary classifier tokens, which are combined with learnable global classifier tokens, enabling the model to predict the semantic labels of each binary mask. 3. **Designing 3D deep convolution adapters and 3D deep MLP adapters**: These adapters are injected into each transformer block of the image encoder and mask decoder, enabling the pre - trained 2D SAM model to extract 3D information and adapt to 3D medical image segmentation tasks. 4. **Conducting extensive experimental verification**: Experiments were carried out on three challenging datasets (AMOS2022, ACDC, and Synapse), and the results show that MaskSAM achieved state - of - the - art performance in the Dice coefficient, improving by 2.7%, 1.7%, and 1.0% respectively compared to nnUNet. Through these improvements, MaskSAM not only retains the zero - shot ability of SAM, but also successfully adapts it to medical image segmentation tasks, significantly improving performance.

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

SAM on Medical Images: A Comprehensive Study on Three Prompt Modes

SAM-MPA: Applying SAM to Few-shot Medical Image Segmentation using Mask Propagation and Auto-prompting

MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

SAM-Med2D

DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation

Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

How to Efficiently Adapt Large Segmentation Model(SAM) to Medical Images

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation

CoSAM: Self-Correcting SAM for Domain Generalization in 2D Medical Image Segmentation

SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting

SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation

Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision