Genetically and semantically aware homogeneous network for prediction and scoring of comorbidities

Karpaga Priyaa Kartheeswaran,Arockia Xavier Annie Rayan,Geetha Thekkumpurath Varrieth
DOI: https://doi.org/10.1016/j.compbiomed.2024.109252
Abstract:Objective: Patients with comorbidities are highly prone to mortality risk than those suffering from a single disease. Therefore, quantification and prediction of disease comorbidities is necessary to stratify the mortality risk of the patients, predict the probability of their occurrence, design treatment strategies, and to prevent the progression of diseases. Enriching comorbidity disease relationships with rich semantics established by genetic components play a vital role in effectively quantifying and predicting comorbidities. However, the existing studies have not extensively explored the semantic richness conveyed by different types of genetic links connecting the comorbidity pairs. Methods: To solve this, a novel genetic-semantic aware weighted homogeneous network-based method, GSWHomoNet is proposed which first constructs the gene enriched comorbidity heterogeneous network, CoGHetNet with encoded genetic semantic aware weighted meta-path instance disease pair embedding to obtain an enhanced disease node embedding of the network. For enhanced comorbidity prediction and scoring, both direct and indirect semantically enriched comorbidity relationships of the disease nodes is preserved while transforming heterogeneous to homogeneous comorbidity network GSWHomoNet. The proposed GSWHomoNet not only helps discover comorbidity links transductively between known-known disease pairs but also improves the inductive link prediction between known-unknown disease pairs by supplying unknown disease nodes with semantically enriched heterogeneous structural knowledge. Results: The effectiveness of the proposed components is proved by AUC scores of 0.895 and 0.860, as well as AUPR scores of 0.903 and 0.873 for transductive and inductive link prediction respectively. In comorbidity scoring, GSWHomoNet outperformed other methods with a correlation result of 0.848. The effect of the improved association prediction ability of the genetic semantic aware weighted meta-path instance embedding based node embedding is proved on disease-microbe and bibliographic heterogeneous network datasets. For biological significance of GSWHomoNet-based comorbidity scoring, we compared it with gene, pathway, and protein-protein interaction (PPI) perspectives, revealing a stronger correlation with the PPI aspect. We identified a substantial number of predicted comorbidity disease pairs, with 77,456 and 48,972 pairs supported by literature evidence for transductive and inductive predictions, respectively. Additionally, we highlighted shared pathways and PPIs for these pairs, demonstrating the robustness of comorbidity predictions.
What problem does this paper attempt to address?