Phen2Disease: A Phenotype-driven Semantic Similarity-based Integrated Model for Disease and Gene Prioritization

Weiqi Zhai,Xiaodi Huang,Nan Shen,Shanfeng Zhu
DOI: https://doi.org/10.1101/2022.12.02.518845
2022-01-01
Abstract:By utilizing the Human Phenotype Ontology (HPO), recent approaches to prioritizing disease-causing genes for patients become popular. However, these approaches do not comprehensively use information about phenotypes of diseases and patients. We present a new method called Phen2Disease that calculates similarity scores between two phenotype sets of patients and diseases by which to prioritize diseases and genes. Specifically, we calculate three scores of information content-based similarities using the phenotypes, and their combination as the respective benchmarks, and integrate them as a final score. Comprehensive experiments were conducted on six real data cohorts with 2051 cases and two simulated data cohorts with 1000 cases. Compared with the three state-of-the-art methods, if we only use phenotype information and HPO knowledge base, Phen2Disease outperformed all of them, particularly in cohorts with the less average numbers of HPO terms. We have found that patients with higher information content scores had more specific information so their predictions would be more accurate. In addition, Phen2Disease has high interpretability with ranked diseases and patient HPO terms provided. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?