Ultrasound-based Deep Learning in the Establishment of a Breast Lesion Risk Stratification System: a Multicenter Study

Yang Gu,Wen Xu,Ting Liu,Xing An,Jiawei Tian,Haitao Ran,Weidong Ren,Cai Chang,Jianjun Yuan,Chunsong Kang,Youbin Deng,Hui Wang,Baoming Luo,Shenglan Guo,Qi Zhou,Ensheng Xue,Weiwei Zhan,Qing Zhou,Jie Li,Ping Zhou,Man Chen,Ying Gu,Wu Chen,Yuhong Zhang,Jianchu Li,Longfei Cong,Lei Zhu,Hongyan Wang,Yuxin Jiang
DOI: https://doi.org/10.1007/s00330-022-09263-8
IF: 7.034
2022-01-01
European Radiology
Abstract:Objectives To establish a breast lesion risk stratification system using ultrasound images to predict breast malignancy and assess Breast Imaging Reporting and Data System (BI-RADS) categories simultaneously. Methods This multicenter study prospectively collected a dataset of ultrasound images for 5012 patients at thirty-two hospitals from December 2018 to December 2020. A deep learning (DL) model was developed to conduct binary categorization (benign and malignant) and BI-RADS categories (2, 3, 4a, 4b, 4c, and 5) simultaneously. The training set of 4212 patients and the internal test set of 416 patients were from thirty hospitals. The remaining two hospitals with 384 patients were used as an external test set. Three experienced radiologists performed a reader study on 324 patients randomly selected from the test sets. We compared the performance of the DL model with that of three radiologists and the consensus of the three radiologists. Results In the external test set, the DL model achieved areas under the receiver operating characteristic curve (AUCs) of 0.980 and 0.945 for the binary categorization and six-way categorizations, respectively. In the reader study set, the DL BI-RADS categories achieved a similar AUC (0.901 vs. 0.933, p = 0.0632), sensitivity (90.98% vs. 95.90%, p = 0.1094), and accuracy (83.33% vs. 79.01%, p = 0.0541), but higher specificity (78.71% vs. 68.81%, p = 0.0012) than those of the consensus of the three radiologists. Conclusions The DL model performed well in distinguishing benign from malignant breast lesions and yielded outcomes similar to experienced radiologists. This indicates the potential applicability of the DL model in clinical diagnosis. Key Points • The DL model can achieve binary categorization for benign and malignant breast lesions and six-way BI-RADS categorizations for categories 2, 3, 4a, 4b, 4c, and 5, simultaneously. • The DL model showed acceptable agreement with radiologists for the classification of breast lesions. • The DL model performed well in distinguishing benign from malignant breast lesions and had promise in helping reduce unnecessary biopsies of BI-RADS 4a lesions.
What problem does this paper attempt to address?