A network-based deep learning methodology for stratification of tumor mutations

Chuang Liu,Zhen Han,Zi-Ke Zhang,Ruth Nussinov,Feixiong Cheng
DOI: https://doi.org/10.1093/bioinformatics/btaa1099
IF: 5.8
2021-01-01
Bioinformatics
Abstract:Abstract Motivation Tumor stratification has a wide range of biomedical and clinical applications, including diagnosis, prognosis and personalized treatment. However, cancer is always driven by the combination of mutated genes, which are highly heterogeneous across patients. Accurately subdividing the tumors into subtypes is challenging. Results We developed a network-embedding based stratification (NES) methodology to identify clinically relevant patient subtypes from large-scale patients’ somatic mutation profiles. The central hypothesis of NES is that two tumors would be classified into the same subtypes if their somatic mutated genes located in the similar network regions of the human interactome. We encoded the genes on the human protein–protein interactome with a network embedding approach and constructed the patients’ vectors by integrating the somatic mutation profiles of 7344 tumor exomes across 15 cancer types. We firstly adopted the lightGBM classification algorithm to train the patients’ vectors. The AUC value is around 0.89 in the prediction of the patient’s cancer type and around 0.78 in the prediction of the tumor stage within a specific cancer type. The high classification accuracy suggests that network embedding-based patients’ features are reliable for dividing the patients. We conclude that we can cluster patients with a specific cancer type into several subtypes by using an unsupervised clustering algorithm to learn the patients’ vectors. Among the 15 cancer types, the new patient clusters (subtypes) identified by the NES are significantly correlated with patient survival across 12 cancer types. In summary, this study offers a powerful network-based deep learning methodology for personalized cancer medicine. Availability and implementation Source code and data can be downloaded from https://github.com/ChengF-Lab/NES. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?