Abstract:Background: Large foundation models, such as the Segment Anything Model (SAM), have shown remarkable performance in image segmentation tasks. However, the optimal approach to achieve true utility of these models for domain-specific applications, such as medical image segmentation, remains an open question. Recent studies have released a medical version of the foundation model MedSAM by training on vast medical data, who promised SOTA medical segmentation. Independent community inspection and dissection is needed. Purpose: This study assesses the performance of off-the-shelf medical foundation model MedSAM for the segmentation of anatomical structures in pelvic MR images. We also evaluate the dependency on prompting scheme and demonstrate the gain of further specialized fine-tuning. Methods: MedSAM and its lightweight version LiteMedSAM were evaluated out-of-the-box on a public MR dataset consisting of 589 pelvic images split 80:20 for training and testing. An nnU-Net model was trained from scratch to serve as a benchmark and to provide bounding box prompts for MedSAM. MedSAM was evaluated using different quality bounding boxes, those derived from ground truth labels, those derived from nnU-Net, and those derived from the former two but with 5-pixel isometric expansion. Lastly, LiteMedSAM was refined on the training set and reevaluated on this task. Results: Out-of-the-box MedSAM and LiteMedSAM both performed poorly across the structure set, especially for disjoint or non-convex structures. Varying prompt with different bounding box inputs had minimal effect. The mean Dice score and mean Hausdorff distances (in mm) for obturator internus using MedSAM and LiteMedSAM were {0.251 +/- 0.110, 0.101 +/- 0.079} and {34.142 +/- 5.196, 33.688 +/- 5.306}, respectively. Fine-tuning of LiteMedSAM led to significant performance gain, improving Dice score and Hausdorff distance for the obturator internus to 0.864 +/- 0.123 and 5.022 +/- 10.684, on par with nnU-Net with no significant difference in evaluation of most structures. All segmentation structures all benefited significantly from specialized refinement, at varying improvement margin. Conclusion: Our study alludes to the potential of deep learning models like MedSAM and lite MedSAM for medical segmentation but also highlight the need for specialized refinement and adjudication: it is quite likely that off-the-shelf use of such large foundation models may be suboptimal, and specialized fine-tuning can significantly enhance segmentation accuracy.

Foundation versus Domain-Specific Model for Cardiac Ultrasound Segmentation

Comparative Eminence: Foundation versus Domain-Specific Model for Cardiac Ultrasound Segmentation

UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets

The Ability of Segmenting Anything Model (SAM) to Segment Ultrasound Images.

Foundation Models for Biomedical Image Segmentation: A Survey

How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model

Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation

Beyond Adapting SAM: Towards End-to-End Ultrasound Image Segmentation via Auto Prompting

SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography

The potential of 'Segment Anything' (SAM) for universal intelligent ultrasound image guidance

Segment Any Medical Model Extended

Multi-Prompt Fine-Tuning of Foundation Models for Enhanced Medical Image Segmentation

SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images

Segment Anything Model for Medical Image Analysis: an Experimental Study

Necessity and Impact of Specialization of Large Foundation Model for Medical Segmentation Tasks

Enhancing left ventricular segmentation in echocardiography with a modified mixed attention mechanism in SegFormer architecture

Are foundation models efficient for medical image segmentation?

CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation

ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation