Automatic tumor segmentation and lymph node metastasis prediction in papillary thyroid carcinoma using ultrasound keyframes
Xian‐Ya Zhang,Di Zhang,Zhi‐Yuan Wang,Jun Chen,Jia‐Yu Ren,Ting Ma,Jian‐Jun Lin,Christoph F. Dietrich,Xin‐Wu Cui
DOI: https://doi.org/10.1002/mp.17498
IF: 4.506
2024-11-01
Medical Physics
Abstract:Background Accurate preoperative prediction of cervical lymph node metastasis (LNM) for papillary thyroid carcinoma (PTC) patients is essential for disease staging and individualized treatment planning, which can improve prognosis and facilitate better management. Purpose To establish a fully automated deep learning‐enabled model (FADLM) for automated tumor segmentation and cervical LNM prediction in PTC using ultrasound (US) video keyframes. Methods The bicentral study retrospective enrolled 518 PTC patients, who were then randomly divided into the training (Hospital 1, n = 340), internal test (Hospital 1, n = 83), and external test cohorts (Hospital 2, n = 95). The FADLM integrated mask region‐based convolutional neural network (Mask R‐CNN) for automatic thyroid primary tumor segmentation and ResNet34 with Bayes strategy for cervical LNM diagnosis. A radiomics model (RM) using the same automated segmentation method, a traditional radiomics model (TRM) using manual segmentation, and a clinical‐semantic model (CSM) were developed for comparison. The dice similarity coefficient (DSC) was used to evaluate segmentation performance. The prediction performance of the models was validated in terms of discrimination and clinical utility with the area under the receiver operator characteristic curve (AUC), heatmap analysis, and decision curve analysis (DCA). The comparison of the predictive performance among different models was conducted by DeLong test. The performances of two radiologists compared with FADLM and the diagnostic augmentation with FADLM's assistance were analyzed in terms of accuracy, sensitivity and specificity using McNemar's x2 test. The p‐value less than 0.05 was defined as a statistically significant difference. The Benjamini‐Hochberg procedure was applied for multiple comparisons to deal with Type I error. Results The FADLM yielded promising segmentation results in training (DSC: 0.88 ± 0.23), internal test (DSC: 0.88 ± 0.23), and external test cohorts (DSC: 0.85 ± 0.24). The AUCs of FADLM for cervical LNM prediction were 0.78 (95% CI: 0.73, 0.83), 0.83 (95% CI: 0.74, 0.92), and 0.83 (95% CI: 0.75, 0.92), respectively. It all significantly outperformed the RM (AUCs: 0.78 vs. 0.72; 0.83 vs. 0.65; 0.83 vs. 0.68, all adjusted p‐values
radiology, nuclear medicine & medical imaging