Abstract:Background Large foundation models, such as the Segment Anything Model (SAM), have shown remarkable performance in image segmentation tasks. However, the optimal approach to achieve true utility of these models for domain‐specific applications, such as medical image segmentation, remains an open question. Recent studies have released a medical version of the foundation model MedSAM by training on vast medical data, who promised SOTA medical segmentation. Independent community inspection and dissection is needed. Purpose Foundation models are developed for general purposes. On the other hand, stable delivery of reliable performance is key to clinical utility. This study aims at elucidating the potential advantage and limitations of landing the foundation models in clinical use by assessing the performance of off‐the‐shelf medical foundation model MedSAM for the segmentation of anatomical structures in pelvic MR images. We also explore the simple remedies by evaluating the dependency on prompting scheme. Finally, we demonstrate the need and performance gain of further specialized fine‐tuning. Methods MedSAM and its lightweight version LiteMedSAM were evaluated out‐of‐the‐box on a public MR dataset consisting of 589 pelvic images split 80:20 for training and testing. An nnU‐Net model was trained from scratch to serve as a benchmark and to provide bounding box prompts for MedSAM. MedSAM was evaluated using different quality bounding boxes, those derived from ground truth labels, those derived from nnU‐Net, and those derived from the former two but with 5‐pixel isometric expansion. Lastly, LiteMedSAM was refined on the training set and reevaluated on this task. Results Out‐of‐the‐box MedSAM and LiteMedSAM both performed poorly across the structure set, especially for disjoint or non‐convex structures. Varying prompt with different bounding box inputs had minimal effect. For example, the mean Dice score and mean Hausdorff distances (in mm) for obturator internus using MedSAM and LiteMedSAM were {0.251 ± 0.110, 0.101 ± 0.079} and {34.142 ± 5.196, 33.688 ± 5.306}, respectively. Fine‐tuning of LiteMedSAM led to significant performance gain, improving Dice score and Hausdorff distance for the obturator internus to 0.864 ± 0.123 and 5.022 ± 10.684, on par with nnU‐Net with no significant difference in evaluation of most structures. All segmentation structures benefited significantly from specialized refinement, at varying improvement margin. Conclusion While our study alludes to the potential of deep learning models like MedSAM and LiteMedSAM for medical segmentation, it highlights the need for specialized refinement and adjudication. Off‐the‐shelf use of such large foundation models is highly likely to be suboptimal, and specialized fine‐tuning is often necessary to achieve clinical desired accuracy and stability.

Necessity and Impact of Specialization of Large Foundation Model for Medical Segmentation Tasks

Necessity and impact of specialization of large foundation model for medical segmentation tasks

How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model

Segment Anything Model for Medical Image Analysis: an Experimental Study

Multi-Prompt Fine-Tuning of Foundation Models for Enhanced Medical Image Segmentation

Customized Segment Anything Model for Medical Image Segmentation

SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation

Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey

Enhancing Medical Imaging Segmentation with GB-SAM: A Novel Approach to Tissue Segmentation Using Granular Box Prompts

Segment Any Medical Model Extended

Segment Anything Model for Medical Images?

Segment anything model for medical image segmentation: Current applications and future directions

How Segment Anything Model (SAM) Boost Medical Image Segmentation?

Segment Anything in Medical Images

An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance

Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation

Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model

MedLSAM: Localize and Segment Anything Model for 3D CT Images