Predicting distant metastasis of bladder cancer using multiple machine learning models: a study based on the SEER database with external validation
Xin Chang Zou,Xue Peng Rao,Jian Biao Huang,Jie Zhou,Hai Chao Chao,Tao Zeng
DOI: https://doi.org/10.3389/fonc.2024.1477166
IF: 4.7
2024-12-14
Frontiers in Oncology
Abstract:Background and purpose: Distant metastasis in bladder cancer is linked to poor prognosis and significant mortality. Machine learning (ML), a key area of artificial intelligence, has shown promise in the diagnosis, staging, and treatment of bladder cancer. This study aimed to employ various ML techniques to predict distant metastasis in patients with bladder cancer. Patients and methods: Patients diagnosed with bladder cancer in the Surveillance, Epidemiology, and End Results (SEER) database from 2000 to 2021 were included in this study. After a rigorous screening process, a total of 4,108 patients were selected for further analysis, divided in a 7:3 ratio into a training cohort and an internal validation cohort. In addition, 118 patients treated at the Second Affiliated Hospital of Nanchang University were included as an external validation cohort. Features were filtered using the least absolute shrinkage and selection operator (LASSO) regression algorithm. Based on the significant features identified, three ML algorithms were utilized to develop prediction models: logistic regression, support vector machine (SVM), and linear discriminant analysis (LDA). The predictive performance of the three models was evaluated by obtaining the area under the receiver operating characteristic (ROC) curve (AUC), the precision, the accuracy, and the F1 score. Results: According to the statistical results, the final probability of distant metastasis in the population was 12.0% ( n = 495). LASSO regression analysis revealed that age, chemotherapy, tumor size, the examination of non-regional lymph nodes, and regional lymph node evaluation were significantly associated with distant metastasis of bladder cancer. In the internal validation cohort, the prediction accuracy rates for logistic regression, SVM, and LDA were 0.874, 0.877, and 0.845, respectively. The precision rates were 0.805, 0.769, and 0.827, respectively, and the F1 scores were 0.821, 0.819, and 0.835, respectively. The ROC curve demonstrated that the AUC for all models was greater than 0.7. In the external validation cohort, the prediction accuracy rates for logistic regression, SVM, and LDA were 0.856, 0.848, and 0.797, respectively, with the ROC curve indicating that the AUC also exceeded 0.7. The precision rates were 0.877, 0.718, and 0.736, respectively, and the F1 scores were 0.797, 0.778, and 0.762, respectively. Among the algorithms used, logistic regression demonstrated better predictive efficiency than the other two methods. The top three variables with the highest importance scores in the logistic regression were non-regional lymph nodes, age, and chemotherapy. Conclusion: The prediction model developed using three ML algorithms demonstrated strong accuracy and discriminative capability in predicting distant metastasis in patients with bladder cancer. This might help clinicians in understanding patient prognosis and in formulating personalized treatment strategies, ultimately improving the overall prognosis of patients with bladder cancer.
oncology