MODIG: Integrating Multi-Omics and Multi-Dimensional Gene Network for Cancer Driver Gene Identification Based on Graph Attention Network Model.

Wenyi Zhao,Xun Gu,Shuqing Chen,Jian Wu,Zhan Zhou
DOI: https://doi.org/10.1093/bioinformatics/btac622
IF: 5.8
2022-01-01
Bioinformatics
Abstract:Motivation Identifying genes that play a causal role in cancer evolution remains one of the biggest challenges in cancer biology. With the accumulation of high-throughput multi-omics data over decades, it becomes a great challenge to effectively integrate these data into the identification of cancer driver genes. Results Here, we propose MODIG, a graph attention network (GAT)-based framework to identify cancer driver genes by combining multi-omics pan-cancer data (mutations, copy number variants, gene expression and methylation levels) with multi-dimensional gene networks. First, we established diverse types of gene relationship maps based on protein-protein interactions, gene sequence similarity, KEGG pathway co-occurrence, gene co-expression patterns and gene ontology. Then, we constructed a multi-dimensional gene network consisting of approximately 20 000 genes as nodes and five types of gene associations as multiplex edges. We applied a GAT to model within-dimension interactions to generate a gene representation for each dimension based on this graph. Moreover, we introduced a joint learning module to fuse multiple dimension-specific representations to generate general gene representations. Finally, we used the obtained gene representation to perform a semi-supervised driver gene identification task. The experiment results show that MODIG outperforms the baseline models in terms of area under precision-recall curves and area under the receiver operating characteristic curves. Availability and implementation The MODIG program is available at . The code and data underlying this article are also available on Zenodo, at https://doi.org/10.5281/zenodo.7057241. Contact: zhanzhou@zju.edu.cn or wujian2000@zju.edu.cn Supplementary information are available at Bioinformatics online.
What problem does this paper attempt to address?