Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision

Mélanie Gaillochet,Christian Desrosiers,Hervé Lombaert
2024-09-30
Abstract:Foundation models such as the recently introduced Segment Anything Model (SAM) have achieved remarkable results in image segmentation tasks. However, these models typically require user interaction through handcrafted prompts such as bounding boxes, which limits their deployment to downstream tasks. Adapting these models to a specific task with fully labeled data also demands expensive prior user interaction to obtain ground-truth annotations. This work proposes to replace conditioning on input prompts with a lightweight module that directly learns a prompt embedding from the image embedding, both of which are subsequently used by the foundation model to output a segmentation mask. Our foundation models with learnable prompts can automatically segment any specific region by 1) modifying the input through a prompt embedding predicted by a simple module, and 2) using weak labels (tight bounding boxes) and few-shot supervision (10 samples). Our approach is validated on MedSAM, a version of SAM fine-tuned for medical images, with results on three medical datasets in MR and ultrasound imaging. Our code is available on <a class="link-external link-https" href="https://github.com/Minimel/MedSAMWeakFewShotPromptAutomation" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reduce the dependence on a large amount of labeled data and the need for user interaction in medical image segmentation tasks. Specifically, the paper proposes a method to automate MedSAM (a version of the Segment Anything Model for medical images). By directly learning prompt embeddings from a small number of weakly - labeled samples (i.e., tight bounding boxes), automatic segmentation of specific regions can be achieved. This method aims to improve the performance of the model in the few - shot setting while reducing the cost and complexity of developing specialized segmentation models. ### Core contributions of the paper: 1. **Automated prompt module**: A lightweight prompt module is introduced, which can automatically generate prompt embeddings from the embeddings of the input image, replacing the prompts that originally needed to be manually provided by users. 2. **Weakly - supervised and few - shot learning**: This module can be trained with only a small number of weakly - labeled samples with tight bounding boxes, greatly reducing the need for fully - labeled data. 3. **No need to fine - tune MedSAM**: The proposed module can be directly added to MedSAM without the need to fine - tune MedSAM, maintaining the universality of the base model. ### Method overview: - **Design of the prompt module**: The prompt module consists of two main parts: a convolutional layer for generating dense embeddings and a fully - connected layer for generating sparse embeddings. These two embeddings are combined with the image embeddings of MedSAM to generate the final segmentation mask. - **Design of the loss function**: In order to utilize the weak labels of tight bounding boxes, the paper designs three loss terms: - **Empty region loss** ($L_{\text{empty}}$): Ensure that the area outside the bounding box contains only the background. - **Tight - box constraint loss** ($L_{\text{tightbox}}$): Ensure that at least one foreground pixel passes through each horizontal and vertical line segment. - **Foreground size constraint loss** ($L_{\text{size}}$): Ensure that the size of the predicted foreground area is within a certain range. ### Experimental results: - **Data set**: The paper is verified on three publicly available medical image data sets: HC18, CAMUS, and ACDC. - **Performance comparison**: The experimental results show that even with only 10 samples, the performance of the proposed method on multiple tasks is still better than that of the UNet and TransUNet models trained with fully - labeled data. Especially in the right ventricular (RV) segmentation task, the performance degradation is less. ### Conclusion: The method proposed in the paper effectively automates MedSAM, enabling it to achieve high - quality medical image segmentation with only a small number of weakly - labeled samples. This not only reduces the cost of data labeling but also improves the robustness of the model in the few - shot setting.