Unveiling Prognostic RNA Biomarkers through a Multi-Cohort Study in Colorectal Cancer

Zehwan Kim,Jaebon Lee,Ye Eun Yoon,Jae Won Yun
DOI: https://doi.org/10.3390/ijms25063317
IF: 5.6
2024-03-15
International Journal of Molecular Sciences
Abstract:Because cancer is a leading cause of death and is thought to be caused by genetic errors or genomic instability in many circumstances, there have been studies exploring cancer's genetic basis using microarray and RNA-seq methods, linking gene expression data to patient survival. This research introduces a methodological framework, combining heterogeneous gene expression data, random forest selection, and pathway analysis, alongside clinical information and Cox regression analysis, to discover prognostic biomarkers. Heterogeneous gene expression data for colorectal cancer were collected from TCGA-COAD (RNA-seq), and GSE17536 and GSE39582 (microarray), and were integrated with Entrez Gene IDs. Using Cox regression analysis and random forest, genes with consistent hazard ratios and significantly affecting patient survivability were chosen. Predictive accuracy was evaluated using ROC curves. Pathway analysis identified potential RNA biomarkers. The authors identified 28 RNA biomarkers. Pathway analysis revealed enrichment in cancer-related pathways, notably EGFR downstream signaling and IGF1R signaling. Three RNA biomarkers (ZEB1-AS1, PI4K2A, and ITGB8-AS1) and two clinical biomarkers (stage and age) were chosen for a prognostic model, improving predictive performance compared to using clinical biomarkers alone. Despite biomarker identification challenges, this study underscores integration of heterogenous gene expression data for discovery.
biochemistry & molecular biology,chemistry, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reveal prognostic RNA biomarkers in colorectal cancer through multi - cohort studies. Specifically, the study aims to establish a systematic methodological framework, combining heterogeneous gene expression data, random forest selection, pathway analysis, as well as clinical information and Cox regression analysis, to discover prognostic biomarkers significantly associated with patient survival. The objectives of the study are: 1. Establish a methodological framework to systematically identify prognostic biomarkers using multiple multi - cohort data and minimize false positives. 2. Use this methodological framework to systematically identify and describe RNA biomarkers in colorectal cancer that are independent of clinical parameters such as age and stage. To achieve these goals, the researchers collected heterogeneous gene expression data from TCGA - COAD (RNA - seq), GSE17536 and GSE39582 (microarray), and carried out the following steps: 1. **Data integration**: Integrate gene expression data from different sources into Entrez Gene IDs. 2. **Cox regression analysis**: Use the Cox proportional hazards model to screen out genes that have a consistent hazard ratio in the three cohorts and significantly affect patient survival. 3. **Feature selection**: Select key features for predictive modeling through the random forest model. 4. **Pathway analysis**: Identify potential RNA biomarkers and the signaling pathways in which they are involved. 5. **Model evaluation**: Use the ROC curve to evaluate the accuracy of the prediction model. Finally, the researchers identified 28 RNA biomarkers, and through pathway analysis, they found that these biomarkers were enriched in cancer - related pathways, especially the EGFR downstream signaling pathway and the IGF1R signaling pathway. In addition, they selected three RNA biomarkers (ZEB1 - AS1, PI4K2A and ITGB8 - AS1) and two clinical biomarkers (stage and age) to construct a prognostic model, and the predictive performance of this model is better than that of the model using only clinical biomarkers.