Four‐gene signature based on machine learning filtration could predict prognosis of patients with breast cancer

Bo Liu,Huina Wang,Xin Wang,Junqi Long,Xujie Zhuang,Xinchan Ji,Nian Zhu,Jinmeng Li,Ting Gao,Xuehui Zhang,Jiangyong Yu,Shuangtao Zhao
DOI: https://doi.org/10.1111/exsy.13157
IF: 3.3
2022-11-01
Expert Systems
Abstract:Background This study aims to propose a breast cancer prediction model for early diagnosis and prognosis management of breast cancer. Objective In order to explore the pathogenesis of breast cancer and develop accurate breast cancer screening and treatment methods, we have used machine‐learning technologies to conduct an in‐depth study of breast cancer genetic data to obtain new breast cancer signature and prognostic prediction models. Methods We explored an optimal cluster by unsupervised clustering methods with different expression genes (DEGs) between normal (n = 113) and tumour (n = 1,102) samples. Using least absolute shrinkage and selection operator (LASSO) regression, we selected four biomarkers to develop a predictive model by Cox regression method in the training set (n = 1,083) and validated its predictive accuracy and independence in the testing sets (n = 2,480). Then Gene Set Enrichment Analysis (GSEA) revealed enriched biological pathways in clusters. Finally, we constructed a nomogram including this signature and other significant risk factors to predict survival rates in patients. Results Four mRNAs (CD163L1, QPRT, NKAIN1 and TP53AIP1) between two clusters from 4,938 DEGs were identified, and then a four‐gene model (risk scores = 0.454*CD163L1–0.360*NKAIN1 + 0.581*QPRT + 0.788*TP53AIP1) was established to divide patients into high‐ and low‐risk group with significantly different prognosis (p 1.60; p
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?