Machine learning-based prediction model for distant metastasis of breast cancer

Hao Duan,Yu Zhang,Haoye Qiu,Xiuhao Fu,Chunling Liu,Xiaofeng Zang,Anqi Xu,Ziyue Wu,Xingfeng Li,Qingchen Zhang,Zilong Zhang,Feifei Cui
DOI: https://doi.org/10.1016/j.compbiomed.2024.107943
IF: 7.7
2024-01-06
Computers in Biology and Medicine
Abstract:Background Breast cancer is the most prevalent malignancy in women. Advanced breast cancer can develop distant metastases, posing a severe threat to the life of patients. Because the clinical warning signs of distant metastasis are manifested in the late stage of the disease, there is a need for better methods of predicting metastasis. Methods First, we screened breast cancer distant metastasis target genes by performing difference analysis and weighted gene co-expression network analysis (WGCNA) on the selected datasets, and performed analyses such as GO enrichment analysis on these target genes. Secondly, we screened breast cancer distant metastasis target genes by LASSO regression analysis and performed correlation analysis and other analyses on these biomarkers. Finally, we constructed several breast cancer distant metastasis prediction models based on Logistic Regression (LR) model, Random Forest (RF) model, Support Vector Machine (SVM) model, Gradient Boosting Decision Tree (GBDT) model and eXtreme Gradient Boosting (XGBoost) model, and selected the optimal model from them. Results Several 21-gene breast cancer distant metastasis prediction models were constructed, with the best performance of the model constructed based on the random forest model. This model accurately predicted the emergence of distant metastases from breast cancer, with an accuracy of 93.6 %, an F1-score of 88.9 % and an AUC value of 91.3 % on the validation set. Conclusion Our findings have the potential to be translated into a point-of-care prognostic analysis to reduce breast cancer mortality.
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?