ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation

Aimee Guo,Grace Fei,Hemanth Pasupuleti,Jing Wang
2024-02-25
Abstract:The newly released Segment Anything Model (SAM) is a popular tool used in image processing due to its superior segmentation accuracy, variety of input prompts, training capabilities, and efficient model design. However, its current model is trained on a diverse dataset not tailored to medical images, particularly ultrasound images. Ultrasound images tend to have a lot of noise, making it difficult to segment out important structures. In this project, we developed ClickSAM, which fine-tunes the Segment Anything Model using click prompts for ultrasound images. ClickSAM has two stages of training: the first stage is trained on single-click prompts centered in the ground-truth contours, and the second stage focuses on improving the model performance through additional positive and negative click prompts. By comparing the first stage predictions to the ground-truth masks, true positive, false positive, and false negative segments are calculated. Positive clicks are generated using the true positive and false negative segments, and negative clicks are generated using the false positive segments. The Centroidal Voronoi Tessellation algorithm is then employed to collect positive and negative click prompts in each segment that are used to enhance the model performance during the second stage of training. With click-train methods, ClickSAM exhibits superior performance compared to other existing models for ultrasound image segmentation.
Computer Vision and Pattern Recognition,Artificial Intelligence,Medical Physics
What problem does this paper attempt to address?
The main goal of this paper is to improve accuracy in ultrasound image segmentation, particularly in the diagnosis of breast cancer. The authors have studied the limitations of the Segment Anything Model (SAM) in medical imaging, especially ultrasound images. Although SAM performs well in image segmentation, its training dataset's diversity has not been specifically optimized for medical images, leading to poor segmentation of important structures in ultrasound images with a significant amount of noise. To address this issue, the paper introduces ClickSAM, a method for fine-tuning SAM with click cues, specifically for ultrasound image segmentation. The training of ClickSAM is divided into two stages: the first stage uses single click cues located at the center of the true contours for training; the second stage further improves model performance by adding positive and negative click cues. Positive clicks are based on true positives (correctly identified areas) and false negatives (areas that should be identified but were not), while negative clicks are based on false positives (incorrectly identified areas). The Centroidal Voronoi Tessellation algorithm is used to collect positive and negative click cues in each segment to enhance model performance during the second stage of training. The paper demonstrates the superior performance of ClickSAM in ultrasound image segmentation by comparing it with existing models such as MedSAM and Segmentation Click Train. Experimental results show that ClickSAM achieves an average Intersection over Union (IoU) of 0.916 on breast ultrasound images, significantly higher than MedSAM's 0.863 and Segmentation Click Train's 0.707. This proves the effectiveness and advantages of fine-tuning with click cues under the SAM framework, especially when dealing with complex shapes and non-axis-aligned segmentation tasks. Compared to traditional bounding box methods, click cues provide more precise coordinate information and reduce the inclusion of mis-segmented areas. For future work, ClickSAM plans to integrate the concept of PseudoClick, which allows for the automatic prediction of false positives and false negatives and the generation of cues without user-provided hints, further increasing the level of automation. Additionally, the application scope of ClickSAM is expected to expand to other medical imaging modalities, such as MRI or CT scans, and to the diagnosis of other diseases, becoming a universal medical image segmentation model.