Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks

Aryo Pradipta Gema,Dominik Grabarczyk,Wolf De Wulf,Piyush Borole,Javier Antonio Alfaro,Pasquale Minervini,Antonio Vergari,Ajitha Rajan
2023-08-31
Abstract:Knowledge graphs are powerful tools for representing and organising complex biomedical data. Several knowledge graph embedding algorithms have been proposed to learn from and complete knowledge graphs. However, a recent study demonstrates the limited efficacy of these embedding algorithms when applied to biomedical knowledge graphs, raising the question of whether knowledge graph embeddings have limitations in biomedical settings. This study aims to apply state-of-the-art knowledge graph embedding models in the context of a recent biomedical knowledge graph, BioKG, and evaluate their performance and potential downstream uses. We achieve a three-fold improvement in terms of performance based on the HITS@10 score over previous work on the same biomedical knowledge graph. Additionally, we provide interpretable predictions through a rule-based method. We demonstrate that knowledge graph embedding models are applicable in practice by evaluating the best-performing model on four tasks that represent real-life polypharmacy situations. Results suggest that knowledge learnt from large biomedical knowledge graphs can be transferred to such downstream use cases. Our code is available at <a class="link-external link-https" href="https://github.com/aryopg/biokge" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore the effectiveness and application potential of Knowledge Graph Embeddings (KGE) in the biomedical field. Specifically, the paper focuses on the following aspects: 1. **Evaluating the performance of KGE in biomedical knowledge graphs**: - The authors use the latest biomedical knowledge graph, BioKG, to evaluate the performance of various state-of-the-art KGE models in link prediction tasks. - By comparing with previous research results, the paper demonstrates the improvements of these models in biomedical knowledge graphs. 2. **Exploring the application of KGE in downstream tasks**: - The study investigates whether pre-trained KGE models can be effectively transferred to 4 practical polypharmacology tasks, verifying the feasibility and effectiveness of KGE models in real-world applications. 3. **Improving the interpretability of KGE models**: - A rule-based learning model (AnyBURL) is introduced to provide interpretable prediction results, which is particularly important in the biomedical field. ### Main Contributions 1. **Performance Improvement**: - The best KGE model (ComplEx) achieved significant performance improvements in HITS@10 and Mean Reciprocal Rank (MRR) metrics compared to previous work. For example, the HITS@10 of ComplEx increased from 0.286 to 0.793. 2. **Interpretability of Rule Learning**: - The AnyBURL model not only achieved a competitive HITS@10 score (0.677) but also provided interpretable rules that help understand the prediction results. 3. **Application in Downstream Tasks**: - The pre-trained KGE models performed excellently in 4 polypharmacology tasks, validating the feasibility of the transfer learning paradigm. Especially for tasks with less data (such as DPI-FDA), the pre-trained models significantly improved performance and training efficiency. ### Conclusion Through comprehensive evaluation and experiments, the paper demonstrates the effectiveness and potential application value of KGE models in biomedical knowledge graphs. Particularly in link prediction and downstream tasks, the pre-trained KGE models showed significant advantages, providing new tools and methods for biomedical research.