Integrating Embeddings of Multiple Gene Networks to Prioritize Complex Disease-Associated Genes

Mengmeng Wu,Wanwen Zeng,Wenqiang Liu,Yijia Zhang,Ting Chen,Rui Jiang
DOI: https://doi.org/10.1109/bibm.2017.8217651
2017-01-01
Abstract:Genome-wide association study (GWAS), as one primary approach for genetic studies, has been successfully applied to a variety of complex diseases, leading to the discovery of substantial disease-associated loci. These discovered associations provide unprecedented opportunities for deepening our understanding of complex diseases, such as disease-associated risk variants, genes, and pathways. However, it is non-trivial to extract biological knowledge from the GWAS data due to the existence of several non-negligible factors. For example, the majority of associated loci fall into noncoding regions without certain links to any genes, complicating its functional characterization. Network-based GWAS gene prioritization, aiming to integrate gene networks with GWAS data, emerges as one promising direction towards solving these challenges and has attracted much attention recently. However, gene networks are usually sparse and noisy, and existing methods do not explicitly consider these properties, leading to suboptimal performance. In this paper, we proposed a novel method called REGENT for integrating multiple gene networks with GWAS data to prioritize complex disease-associated genes. Specifically, we leveraged the network representation learning, a recently developed technique for analyzing social networks, to learn compact and robust embeddings from multiple gene networks. To integrate these learned embeddings of genes with GWAS data, we developed a hierarchical statistical model and derived an efficient inference algorithm for model estimation and prediction. Applying to GWAS data of six complex diseases, we demonstrated that REGENT outperformed existing methods regarding the identification of known disease-associated genes. Also, pathway analysis showed that REGENT helped discover disease-associated pathways. Therefore, our method is expected to be a useful tool for post-GWAS analysis.
What problem does this paper attempt to address?