Discovering Biomarkers of Hepatocellular Carcinoma from Single-Cell RNA Sequencing Data by Cooperative Games on Gene Regulatory Network

Zishuang Zhang,Chenxi Sun,Zhi-Ping Liu
DOI: https://doi.org/10.1016/j.jocs.2022.101881
IF: 3.817
2022-01-01
Journal of Computational Science
Abstract:Effective and reliable biomarker is a promising means to achieve early diagnosis of cancer. The availability of high-throughput single-cell RNA sequencing (scRNA-seq) data opens an unprecedented opportunity of discovering biomarkers by developing machine learning and feature selection methods. At present, the existing biomarker screening methods, such as recursive feature elimination (RFE), often treat genes as isolated features, ignoring their embedding complex network relationship. In addition, the interpretability of a cancer biomarker discovery model is as important as its classification accuracy. To address these problems, we propose a game theoretic method to discover gene modules serving as biomarkers on gene regulatory network (GRN) that can better distinguish hepatocellular carcinoma (HCC) samples with healthy ones. Specifically, the network-based game theory method, called NGTM, is an interpretable module exploration of supervised feature selection procedure. We regard the process of gene-to-model selection as a cooperative game. The contribution of each feature in combination is evaluated by cooperative game theoretic metrics, that is, Shapley values. The extension strategy of gene module is conducted on GRN in the form of subnetwork, and NGTM makes the biomarker recognition easily interpretable. Furthermore, our method is statistically verified by Akaike information criterion (AIC) in model selection. There is a strong correlation between AIC and the area under curve in classification. In comparison study, we test the wrapper RFE and random feature extraction methods on random forest under the same conditions. NGTM achieves relatively better classification performances which prove its advantage. The enriched dysfunctions in biomarkers are also consistent with prior knowledge of the occurrence and development of HCC.
What problem does this paper attempt to address?