Construction of a Colorectal Cancer Prognostic Risk Model and Screening of Prognostic Risk Genes Using Machine-Learning Algorithms
Xuezhi Du,Han Qi,Wenbin Ji,Peiyuan Li,Run Hua,Wenliang Hu,Feng Qi
DOI: https://doi.org/10.1155/2022/9408839
IF: 2.809
2022-10-13
Computational and Mathematical Methods in Medicine
Abstract:This study is aimed at constructing a prognostic risk model for colorectal cancer (CRC) using machine-learning algorithms to provide accurate staging and screening of credible prognostic risk genes. We extracted CRC data from GSE126092 and GSE156355 of the Gene Expression Omnibus (GEO) database and datasets from TCGA to analyze the differentially expressed genes (DEGs) using bioinformatics analysis. Among the 330 shared DEGs related to CRC prognosis, we divided the analysis period into different phases and applied univariate COX regression, LASSO, and multivariate COX regression analysis. GO analysis and KEGG analysis revealed that the functions of these DEGs were primarily focused on cell cycle, DNA replication, cell mitosis, and other related functions, and this confirmed our results from a biological perspective. Finally, a prognostic risk model for CRC based on the CHGA, CLU, PLK1, AXIN2, NR3C2, IL17RB, GCG, and AJUBA genes was constructed, and the risk score enabled us to predict the prognosis for CRC. To obtain a comprehensive and accurate model, we used both internal and external evaluations, and the model was able to correctly differentiate patients with CRC into a high-risk group with poor prognosis and a low-risk group with good prognosis. The AUC values of the 3-, 5-, and 10-year survival ROC curves were 0.715, 0.721, and 0.777, respectively, according to the internal evaluation, and the AUC values were 0.606, 0.698, and 0.608, respectively, for the external evaluation using GSE39582 from the GEO database. We determined that CLU, PLK1, and IL17RB could be considered to be independent prognostic factors for CRC with significantly different expression ( P < 0.05 ). Using machine-learning methods, a prognostic risk model comprised of eight genes was constructed. Not only does this model provide improved treatment guidance, but it also provides a novel perspective for analyzing survival conditions at a deeper biological level.
mathematical & computational biology