Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in the Extremely Unbalanced Data: Experiences from a Real-World Study and Literature Review (Preprint)
Rui Shan,Xin Li,Jing Chen,Zheng Chen,Yuan-Jia Cheng,Bo Han,Run-Ze Hu,Jiu-Ping Huang,Gui-Lan Kong,Hui Liu,Fang Mei,Shi-Bing Song,Bang-Kai Sun,Hui Tian,Yang Wang,Wu-Cai Xiao,Xiang-Yun Yao,Jing-Ming Ye,Bo Yu,Chun-Hui Yuan,Fan Zhang,Zheng Liu
DOI: https://doi.org/10.2196/66269
2024-01-01
JMIR Cancer
Abstract:Diagnosis and treatment of follicular thyroid neoplasms (FTN) remains a huge challenge, as the malignancy risk of FTN cannot be ascertained until the diagnostic surgery. We aimed to use interpretable machine learning to predict the malignant risk of FTN before surgery in the real-world setting. We conducted a retrospective cohort study at the Peking University Third Hospital in Beijing, China between January 2012 and September 2023. We included patients with postoperative pathological diagnoses of follicular thyroid adenoma (FTA) or carcinoma (FTC) and excluded those without thyroid ultrasonography before surgery. We used 22 predictors of demographic characteristics, thyroid sonography, and hormones to train 5 machine learning models and chose the optimal model that performed well in model discrimination, calibration, interpretability, and parsimony. For this extremely unbalanced data (the ratio of FTA to FTC much greater than 1), we adopted both the area under receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) to assess model discrimination. We also summarized the experiences derived from our study findings and a literature review of the existing evidence. This cohort included 1543 patients (mean age: 47.98 ± 14.14 years, female: 73%) with 1676 FTN tumors (FTA: n = 1418; FTC: n = 258; the ratio of FTA to FTC: 5.5). The random forest was chosen as the optimal model, revealing that mean TSH score, mean tumor diameter, mean TSH, margin, and TSH instability were the five most important predictors in discriminating FTA from FTC, with the AUROC of 0.79 (95% CI, 0.77-0.80) and AUPRC of 0.40 (0.38-0.42). The effect of mean diameter on malignancy risk varied with TSH instability; specifically, the malignancy risk was elevated in larger tumors with greater TSH instability, whereas it was reduced in larger tumors with more stable TSH levels. The risk of malignancy tended to nonlinearly increase with a larger mean tumor diameter or higher instability in TSH levels, while it nonlinearly decreased with a higher mean TSH score or mean TSH level. FTCs with small size (mean diameter: 2.89 cm) were more likely to be misclassified as FTAs than those with greater size. Our literature review showed that (1) the ratio of FTA to FTC varied from 0.6 to 4.0 which was lower than the natural distribution of 5.0; (2) no studies have used the AUPRC to assess prediction performance in the unbalanced data; and (3) the external validation of previously derived model did not perform as well as in the original study. TSH measurements and tumor size were important in screening the malignancy risk of FTN before the diagnostic surgery. However, it remained challenging to assess the malignancy risk of small-size FTNs. Future studies are needed to overcome the prediction challenge embedded in the extremely unbalanced distribution of FTA and FTC in real-world data. This study was approved by the Medical Research Ethics Committee of Peking University Third Hospital (No. IRB00006761-M2023168).