Clinical value of radiomics and machine learning in breast ultrasound: a multicenter study for differential diagnosis of benign and malignant lesions
Valeria Romeo,Renato Cuocolo,Roberta Apolito,Arnaldo Stanzione,Antonio Ventimiglia,Annalisa Vitale,Francesco Verde,Antonello Accurso,Michele Amitrano,Luigi Insabato,Annarita Gencarelli,Roberta Buonocore,Maria Rosaria Argenzio,Anna Maria Cascone,Massimo Imbriaco,Simone Maurea,Arturo Brunetti
DOI: https://doi.org/10.1007/s00330-021-08009-2
IF: 7.034
2021-05-21
European Radiology
Abstract:Abstract Objectives We aimed to assess the performance of radiomics and machine learning (ML) for classification of non-cystic benign and malignant breast lesions on ultrasound images, compare ML’s accuracy with that of a breast radiologist, and verify if the radiologist’s performance is improved by using ML. Methods Our retrospective study included patients from two institutions. A total of 135 lesions from Institution 1 were used to train and test the ML model with cross-validation. Radiomic features were extracted from manually annotated images and underwent a multistep feature selection process. Not reproducible, low variance, and highly intercorrelated features were removed from the dataset. Then, 66 lesions from Institution 2 were used as an external test set for ML and to assess the performance of a radiologist without and with the aid of ML, using McNemar’s test. Results After feature selection, 10 of the 520 features extracted were employed to train a random forest algorithm. Its accuracy in the training set was 82% (standard deviation, SD, ± 6%), with an AUC of 0.90 (SD ± 0.06), while the performance on the test set was 82% (95% confidence intervals (CI) = 70–90%) with an AUC of 0.82 (95% CI = 0.70–0.93). It resulted in being significantly better than the baseline reference ( p = 0.0098), but not different from the radiologist (79.4%, p = 0.815). The radiologist’s performance improved when using ML (80.2%), but not significantly ( p = 0.508). Conclusions A radiomic analysis combined with ML showed promising results to differentiate benign from malignant breast lesions on ultrasound images. Key Points • Machine learning showed good accuracy in discriminating benign from malignant breast lesions • The machine learning classifier’s performance was comparable to that of a breast radiologist • The radiologist’s accuracy improved with machine learning, but not significantly
radiology, nuclear medicine & medical imaging