Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

Zekun Jiang,Dongjie Cheng,Ziyuan Qin,Jun Gao,Qicheng Lao,Kang Li,Le Zhang

DOI: https://doi.org/10.48550/arXiv.2402.15759

2024-02-24

Computer Vision and Pattern Recognition

Abstract:This study develops and evaluates a novel multimodal medical image zero-shot segmentation algorithm named Text-Visual-Prompt SAM (TV-SAM) without any manual annotations. TV-SAM incorporates and integrates large language model GPT-4, Vision Language Model GLIP, and Segment Anything Model (SAM), to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, thereby enhancing SAM for zero-shot segmentation. Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training, significantly outperforming SAM AUTO and GSAM, closely matching the performance of SAM BBOX with gold standard bounding box prompts, and surpassing the state-of-the-art on specific datasets like ISIC and WBC. The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm, highlighting the significant contribution of GPT-4 to zero-shot segmentation. By integrating foundational models such as GPT-4, GLIP, and SAM, it could enhance the capability to address complex problems in specialized domains. The code is available at: https://github.com/JZK00/TV-SAM.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of zero-shot segmentation in multimodal medical images. Specifically: 1. **Proposing a New Zero-Shot Segmentation Algorithm**: The research team developed a new algorithm called Text-Visual-Prompt SAM (TV-SAM). This algorithm combines the large language model GPT-4, the vision-language model GLIP, and the Segment Anything Model (SAM) to achieve automatic text and visual prompt generation without manual annotation. 2. **Enhancing SAM's Performance in Zero-Shot Segmentation**: By integrating GPT-4 to automatically generate descriptive text prompts and visual bounding box prompts, the performance of SAM in zero-shot segmentation tasks is enhanced. 3. **Validating the Algorithm's Effectiveness**: The study conducted extensive evaluations on 7 public datasets, covering 8 different medical imaging modalities. It demonstrated that TV-SAM can effectively segment unseen targets and outperformed current state-of-the-art methods on specific datasets such as ISIC and WBC. In summary, this paper attempts to solve the challenge of zero-shot segmentation in multimodal medical images by combining multiple foundational models, thereby improving segmentation accuracy and efficiency.

Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

Zero-shot performance of the Segment Anything Model (SAM) in 2D medical imaging: A comprehensive evaluation and practical guidelines

SAM on Medical Images: A Comprehensive Study on Three Prompt Modes

SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction

$\mathrm{SAM^{Med}}$: A medical image annotation framework based on large vision model

Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

No More Training: SAM's Zero-Shot Transfer Capabilities for Cost-Efficient Medical Image Segmentation

AGSAM: Agent-Guided Segment Anything Model for Automatic Segmentation in Few-Shot Scenarios

Segment Anything Model for Medical Image Analysis: an Experimental Study

Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain

CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation

SAM3D: Zero-Shot Semi-Automatic Segmentation in 3D Medical Images with the Segment Anything Model

SAM-MPA: Applying SAM to Few-shot Medical Image Segmentation using Mask Propagation and Auto-prompting

SAM-Med2D

K-SAM: A Prompting Method Using Pretrained U-Net to Improve Zero Shot Performance of SAM on Lung Segmentation in CXR Images

MA-SAM: Modality-agnostic SAM adaptation for 3D medical image segmentation

Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation

All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning

Interactive 3D Medical Image Segmentation with SAM 2