Cross-site Validation of AI Segmentation and Harmonization in Breast MRI

Yu Huang,Nicholas J Leotta,Lukas Hirsch,Roberto Lo Gullo,Mary Hughes,Jeffrey Reiner,Nicole B Saphier,Kelly S Myers,Babita Panigrahi,Emily Ambinder,Philip Di Carlo,Lars J Grimm,Dorothy Lowell,Sora Yoon,Sujata V Ghate,Lucas C Parra,Elizabeth J Sutton
DOI: https://doi.org/10.1007/s10278-024-01266-9
2024-09-25
Abstract:This work aims to perform a cross-site validation of automated segmentation for breast cancers in MRI and to compare the performance to radiologists. A three-dimensional (3D) U-Net was trained to segment cancers in dynamic contrast-enhanced axial MRIs using a large dataset from Site 1 (n = 15,266; 449 malignant and 14,817 benign). Performance was validated on site-specific test data from this and two additional sites, and common publicly available testing data. Four radiologists from each of the three clinical sites provided two-dimensional (2D) segmentations as ground truth. Segmentation performance did not differ between the network and radiologists on the test data from Sites 1 and 2 or the common public data (median Dice score Site 1, network 0.86 vs. radiologist 0.85, n = 114; Site 2, 0.91 vs. 0.91, n = 50; common: 0.93 vs. 0.90). For Site 3, an affine input layer was fine-tuned using segmentation labels, resulting in comparable performance between the network and radiologist (0.88 vs. 0.89, n = 42). Radiologist performance differed on the common test data, and the network numerically outperformed 11 of the 12 radiologists (median Dice: 0.85-0.94, n = 20). In conclusion, a deep network with a novel supervised harmonization technique matches radiologists' performance in MRI tumor segmentation across clinical sites. We make code and weights publicly available to promote reproducible AI in radiology.
What problem does this paper attempt to address?