Predicting gene-disease associations from the heterogeneous network using graph embedding
Xiaochan Wang,Yuchong Gong,Jing Yi,Wen Zhang
DOI: https://doi.org/10.1109/BIBM47256.2019.8983134
2019-01-01
Abstract:The discovery of gene-disease associations is important for the prevention, diagnosis and treatment of diseases. The studies on gene-disease associations have produced diverse data, which can facilitate the gene-disease association prediction. Integrating diverse information is critical for developing high-accuracy prediction models. In this paper, we propose a heterogeneous network-based method that enhances gene-disease association prediction by using graph embedding and ensemble learning, abbreviated as “HNEEM”. A heterogeneous network is constructed based on gene-disease associations, gene-chemical associations and disease-chemical associations, to combine diverse information. The network uses genes, diseases and chemicals as nodes, and uses their associations as edges. The graph embedding methods are utilized to extract representation vectors of nodes in the heterogeneous network, and the feature vectors of genes and diseases are merged to represent gene-disease pairs, and the random forest is employed to build the prediction model based on gene-disease pairs. We consider six types of graph embedding methods, and take the individual graph embedding method-generated features to build prediction models and use them as base predictors, and then combine base predictors to develop the ensemble learning model HNEEM. We comprehensively compare different graph embedding methods, and results demonstrate that the graph embedding methods produce satisfying results in the gene-disease association prediction, and integrating different graph embedding methods can make further improvements. In computational experiments, HNEEM produces better results compared to the state-of-the-art gene-disease perdition methods, and HNEEM is robust to the data richness as well. Moreover, the usefulness of the proposed method HNEEM is validated by the case studies. In conclusion, HNEEM is a promising method for predicting gene-disease associations.