Multipath2vec: Predicting Pathogenic Genes Via Heterogeneous Network Embedding

Bo Xu,Yu Liu,Shuo Yu,Lei Wang,Lei Liu,Hongfei Lin,Zhihao Yang,Jian Wang,Feng Xia
DOI: https://doi.org/10.1109/BIBM.2018.8621103
2018-01-01
Abstract:Phenotypically similar diseases have been verified to be in connection with specific genes. Predicting disease genes is important in disease prevention, diagnosis, and treatment. In this work, we focus on this significant issue and propose a disease-causing genes prediction method called Multipath2vec. First, we generate an heterogeneous network called GP-network, which is constructed based on three kinds of relationships between genes and phenotypes, including interactions between genes, correlations between phenotypes, and known gene-phenotype pairs. Then, we propose the multi-path, which is used to guide random walk in GP-network in order to better embedding the network. Finally, we use the achieved vector representation of each protein and phenotype to calculate and rank the similarities between candidate genes and the target phenotype. We implement Multipath2vec as well as two baseline approaches (i.e., CATAPULT, and PRINCE) on whole gene-phenotype data, single-gene gene-phenotype data, and many-genes gene-phenotype data. According to leave-one-out cross validation, Multipath2vec achieves better results than baseline approaches. To our best knowledge, this is the first attempt to use heterogeneous network embedding method in handling pathogenic genes. The outperformed experimental results of Multipath2vec shed light on the possibility of applying network representation methods in the disease-causing genes prediction.
What problem does this paper attempt to address?