Development and Validation of Machine Learning Models to Predict Epidermal Growth Factor Receptor Mutation in Non-Small Cell Lung Cancer: A Multi-Center Retrospective Radiomics Study
Yafeng Liu,Jiawei Zhou,Jing Wu,Wenyang Wang,Xueqin Wang,Jianqiang Guo,Qingsen Wang,Xin Zhang,Danting Li,Jun Xie,Xuansheng Ding,Yingru Xing,Dong Hu
DOI: https://doi.org/10.1177/10732748221092926
2022-01-01
Cancer Control
Abstract:Objective To develop and validate a generalized prediction model that can classify epidermal growth factor receptor (EGFR) mutation status in non-small cell lung cancer patients. Methods A total of 346 patients (296 in the training cohort and 50 in the validation cohort) from four centers were included in this retrospective study. First, 1085 features were extracted using IBEX from the computed tomography images. The features were screened using the intraclass correlation coefficient, hypothesis tests and least absolute shrinkage and selection operator. Logistic regression (LR), decision tree (DT), random forest (RF), and support vector machine (SVM) were used to build a radiomics model for classification. The models were evaluated using the following metrics: area under the curve (AUC), calibration curve (CAL), decision curve analysis (DCA), concordance index (C-index), and Brier score. Results Sixteen features were selected, and models were built using LR, DT, RF, and SVM. In the training cohort, the AUCs was .723, .842, .995, and .883; In the validation cohort, the AUCs were .658, 0567, .88, and .765. RF model with the best AUC, its CAL, C-index (training cohort=.998; validation cohort=.883), and Brier score (training cohort=.007; validation cohort=0.137) showed a satisfactory predictive accuracy; DCA indicated that the RF model has better clinical application value. Conclusion Machine learning models based on computed tomography images can be used to evaluate EGFR status in patients with non-small cell lung cancer, and the RF model outperformed LR, DT, and SVM.