Abstract:The Segment Anything Model (SAM) has garnered significant attention for its versatile segmentation abilities and intuitive prompt-based interface. However, its application in medical imaging presents challenges, requiring either substantial training costs and extensive medical datasets for full model fine-tuning or high-quality prompts for optimal performance. This paper introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure. In the initial stage, H-SAM employs SAM's original decoder to generate a prior probabilistic mask, guiding a more intricate decoding process in the second stage. Specifically, we propose two key designs: 1) A class-balanced, mask-guided self-attention mechanism addressing the unbalanced label distribution, enhancing image embedding; 2) A learnable mask cross-attention mechanism spatially modulating the interplay among different image regions based on the prior mask. Moreover, the inclusion of a hierarchical pixel decoder in H-SAM enhances its proficiency in capturing fine-grained and localized details. This approach enables SAM to effectively integrate learned medical priors, facilitating enhanced adaptation for medical image segmentation with limited samples. Our H-SAM demonstrates a 4.78% improvement in average Dice compared to existing prompt-free SAM variants for multi-organ segmentation using only 10% of 2D slices. Notably, without using any unlabeled data, H-SAM even outperforms state-of-the-art semi-supervised models relying on extensive unlabeled training data across various medical datasets. Our code is available at

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently fine - tune large - scale foundation models (such as Segment Anything Model, SAM) in medical image segmentation tasks to adapt to medical image datasets while reducing the dependence on a large amount of labeled data. Specifically, the paper points out that although SAM has very powerful zero - sample segmentation capabilities on natural images, it performs poorly on medical images, mainly because it has not been exposed to medical images during its training process. In addition, existing methods either require a large amount of labeled data for full - model fine - tuning or high - quality prompts to optimize performance, and all of these have certain limitations. Therefore, the paper proposes H - SAM, a prompt - free SAM variant, which aims to achieve efficient model fine - tuning using limited medical data and improve the accuracy of multi - organ segmentation through a two - stage hierarchical decoding process. The main innovation points of H - SAM include: 1. **Hierarchical Decoding**: H - SAM adopts a two - stage hierarchical decoding strategy. In the first stage, the original decoder of SAM is used to generate prior probability masks, and in the second stage, more refined decoding is carried out on this basis. 2. **Class - Balanced Mask - Guided Self - Attention Mechanism**: In order to solve the problem of class imbalance, H - SAM introduces a class - balanced mask - guided self - attention mechanism, which enhances image embeddings by increasing the variation of tail classes. 3. **Learnable Mask Cross - Attention Mechanism**: Through the learnable mask cross - attention mechanism, H - SAM can better regulate the spatial dynamics between different image regions, thereby improving the segmentation effect. 4. **Hierarchical Pixel Decoder**: In order to capture more fine - grained local details, H - SAM also introduces a hierarchical pixel decoder, which further improves the segmentation accuracy in combination with skip connections in the U - Net architecture. The experimental results show that H - SAM performs excellently in multi - organ segmentation tasks, especially in the few - shot setting. Using only 10% of 2D slices, it can achieve an average Dice coefficient of 80.35%, which is significantly better than existing prompt - free SAM variants and other semi - supervised methods. In addition, in prostate and left atrium segmentation tasks, H - SAM also shows excellent performance. Even without using any unlabeled data, it can outperform semi - supervised models that rely on a large amount of unlabeled data.

Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Integrating Spatial Prior Adapter for Enhancing SAM Performance in Medical Image Segmentation

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation

AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model

SEG-SAM: Semantic-Guided SAM for Unified Medical Image Segmentation

PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation

DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation

S-SAM: SVD-based Fine-Tuning of Segment Anything Model for Medical Image Segmentation

How to Efficiently Adapt Large Segmentation Model(SAM) to Medical Images

MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation

ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

SAM-Med2D

Customized Segment Anything Model for Medical Image Segmentation

Self-Sampling Meta SAM: Enhancing Few-shot Medical Image Segmentation with Meta-Learning

SAM-MPA: Applying SAM to Few-shot Medical Image Segmentation using Mask Propagation and Auto-prompting

SAM Fails to Segment Anything? – SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More

Self-Sampling Meta SAM: Enhancing Few-shot Medical Image Segmentation with Meta-Learning

RefSAM3D: Adapting SAM with Cross-modal Reference for 3D Medical Image Segmentation

AGSAM: Agent-Guided Segment Anything Model for Automatic Segmentation in Few-Shot Scenarios

SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images