A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation

Yufan He,Pengfei Guo,Yucheng Tang,Andriy Myronenko,Vishwesh Nath,Ziyue Xu,Dong Yang,Can Zhao,Daguang Xu,Wenqi Li
2024-08-21
Abstract:Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out that the SAM2 paper clearly outlines a zero-shot evaluation pipeline, which simulates user clicks iteratively for up to eight iterations. We reproduced this interactive annotation simulation on 3D CT datasets and provided the results and code~\url{<a class="link-external link-https" href="https://github.com/Project-MONAI/VISTA" rel="external noopener nofollow">this https URL</a>}. Our findings reveal that directly applying SAM2 on 3D medical imaging in a zero-shot manner is far from satisfactory. It is prone to generating false positives when foreground objects disappear, and annotating more slices cannot fully offset this tendency. For smaller single-connected objects like kidney and aorta, SAM2 performs reasonably well but for most organs it is still far behind state-of-the-art 3D annotation methods. More research and innovation are needed for 3D medical imaging community to use SAM2 correctly.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily explores the performance and limitations of the Segment Anything Model 2 (SAM2) in 3D medical image segmentation. Specifically, the paper attempts to address the following core questions: 1. **Zero-shot performance evaluation**: Is SAM2's performance strong enough to solve real-world problems when applied to 3D medical image segmentation in a zero-shot scenario (i.e., without specific training)? How does it compare to the current state-of-the-art methods? 2. **Research direction choice**: For 3D medical imaging researchers, should they shift towards using the SAM2 model itself or merely utilize SAM2's dataset? If models like 3D-UNet are trained on these video datasets, will they achieve better results? 3. **Automatic segmentation capability**: Medical image segmentation typically requires high-precision automatic segmentation models for large-scale cohort analysis. Do SAM2's architecture and its pre-trained weights help in generating state-of-the-art automatic segmentation models? The paper conducts exhaustive testing of SAM2 through standardized evaluation protocols and finds that directly applying SAM2 to 3D medical image segmentation has numerous shortcomings, especially when dealing with complex organ structures. Although SAM2 shows some effectiveness for certain smaller and connected objects (such as kidneys and the aorta), its performance is significantly inferior to existing 3D annotation methods for most organs. Therefore, the paper highlights the need for further research and innovation to improve the application effectiveness of SAM2 in 3D medical imaging.