A pathway-based computational framework for identification of multi-omics biomarkers and its application in esophageal cancer

Qi Zhou,Weicai Ye,Xiaolan Yu,Yun-Juan Bao
DOI: https://doi.org/10.1016/j.cmpb.2024.108077
IF: 6.1
2024-02-15
Computer Methods and Programs in Biomedicine
Abstract:Background The pathway-based strategy has been recently proposed for identifying biomarkers with the advantages of higher biological interpretability and cross-data robustness than the conventional gene-based strategy. However, its utility in clinical applications has been limited due to the high computational complexity and ill-defined performance. Objective The current study presents a machine learning-based computational framework using multi-omics data for identifying a new modal of biomarkers, called pathway-derived core biomarkers, which have the advantages of both gene-based and pathway-based biomarkers. Methods Machine-learning methods and gene-pathway network were integrated to select the pathway-derived core biomarkers. Multiple machine-learning algorithms were used to construct and validate the diagnostic models of the biomarkers based on more than 1400 multi-omics clinical samples of esophageal squamous carcinoma (ESCC). Results The results showed that the classifier models based on the new modal biomarkers achieved superior performance in the training datasets with an average AUC/accuracy of 0.98/0.95 and 0.89/0.81 for mRNAs and miRNA, respectively, higher than the currently known classifier models based on the conventional gene-based strategy and pathway-based strategy. In the testing cohorts, the AUC/accuracy increased by 6.1%/7.3% than the models based on the native gene-based biomarkers. The improved performance was further confirmed in independent validation cohorts. Specifically, the sensitivity/specificity increased by ∼3% and the variance significantly decreased by ∼69% compared with that of the native gene-based biomarkers. Importantly, the pathway-derived core biomarkers also recovered 45% more previously reported biomarkers than the gene-based biomarkers and are more functionally relevant to the ESCC etiology (involved in 14 versus 7 pathways related with ESCC or other cancer), highlighting the cross-data robustness of this new modal of biomarkers via enhanced functional relevance. Conclusions The results demonstrated that the new modal of biomarkers not only have improved predicting performance and robustness, but also exhibit higher functional interpretability thus leading to the potential application in cancer diagnosis.
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods
What problem does this paper attempt to address?