A Hierarchical Structure-Aware Embedding Method for Predicting Phenotype-Gene Associations

Lin Wang,Mingming Liu,Wenqian He,Xu Jin,Maoqiang Xie,Yalou Huang
DOI: https://doi.org/10.1007/978-3-030-75762-5_10
2021-01-01
Abstract:Identifying potential causal genes for disease phenotypes is essential for disease treatment and facilitates drug development. Inspired by existing random-walk based embedding methods and the hierarchical structure of Human Phenotype Ontology (HPO), this work presents a Hierarchical Structure-Aware Embedding Method (HSAEM) for predicting phenotype-gene associations, which explicitly incorporates node type information and node individual difference into random walks. Unlike existing meta-path-guided heterogeneous network embedding techniques, HSAEM estimates an individual jumping probability for each node learned from hierarchical structures of phenotypes and different node influences among genes. The jumping probability guides the current node to select either a heterogeneous neighborhood or a homogeneous neighborhood as the next node, when performing random walks over the heterogeneous network including HPO, phenotype-gene and Protein-Protein Interaction (PPI) networks. The generated node sequences are then fed into a heterogeneous SkipGram model to perform node representations. By defining the individual jumping probability based on hierarchical structure, HSAEM can effectively capture co-occurrence of nodes in the heterogeneous network. HSAEM yields its extraordinary performance not only in the statistical evaluation metrics compared to baselines but also in the practical effectiveness of prioritizing causal genes for Parkinson's Disease.
What problem does this paper attempt to address?