Foundation versus Domain-Specific Model for Cardiac Ultrasound Segmentation

Chieh-Ju Chao,Yunqi Gu,Wasan Kumar,Tiange Xiang,Lalith Appari,Justin Wu,Juan M Farina,Rachael Wraith,Jiwoong Jeong,Reza Arsanjani,Kane C Garvan,Jae K Oh,Curtis P Langlotz,Imon Banerjee,Fei-Fei Li,Ehsan Adeli
DOI: https://doi.org/10.1101/2023.09.19.23295772
2024-10-18
Abstract:Background Vision foundation model, "Segment Anything (SAM)," promises to segment any objects in images. However, the performance of SAM on clinical echocardiography images has yet to be investigated and compared against state-of-the-art models. Method SAM was fine-tuned on the training set of EchoNet-Dynamic (Stanford) and then evaluated on external datasets containing transthoracic echocardiography (TTE) and Point-of-Care Ultrasound (POCUS) images, including CAMUS (University Hospital of St Etienne), and the Mayo Clinic dataset (a sample of 99 non-duplicated patients with 58 TTE and 41 POCUS). Fine-tuned SAM was evaluated against the EchoNet and MedSAM models using the Dice similarity coefficient (DSC). We further conducted an annotator study to evaluate the effectiveness of SAM in assisting clinical segmentation tasks. Results Fine-tuned SAM was superior to EchoNet and MedSAM in most of the datasets. We observed a strong generalization capacity of the fine-tuned SAM model against EchoNet, especially on apical 2 chamber (A2C) images (CAMUS-A2C: DSC 0.891 +/- 0.040 vs. 0.752 +/- 0.196, p<0.0001) and POCUS (DSC 0.857 +/- 0.047 vs. 0.667 +/- 0.279, p<0.0001). SAM also reduced the annotation time by 50% (11.6 +/- 4.5 sec vs. 5.7 +/- 1.7 sec, p< 0.0001) while maintaining the segmentation quality. Conclusions Our approach demonstrates an effective strategy for fine-tuning a vision foundation model, enhancing clinical workflow efficiency through human-artificial intelligence (AI) collaboration, and leveraging its generalization capacity across ultrasound modalities.
Cardiovascular Medicine
What problem does this paper attempt to address?