Abstract:Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. We find settings involving low degree entities, which make up for a substantial amount of the set of entities in the KG, where our method outperforms the baselines. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. Our implementation is available at <a class="link-external link-https" href="https://github.com/elsevier-AI-Lab/BioBLP" rel="external noopener nofollow">this https URL</a> .

Benchmark and Best Practices for Biomedical Knowledge Graph Embeddings

Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks

Application and evaluation of knowledge graph embeddings in biomedical data

Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs

PharmKG: a dedicated knowledge graph benchmark for bomedical data mining

Snomed2Vec: Random Walk and Poincaré Embeddings of a Clinical Knowledge Base for Healthcare Analytics

BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models

Biomedical Knowledge Graph Refinement and Completion using Graph Representation Learning and Top-K Similarity Measure

A Survey on Knowledge Graph Embedding: Approaches, Applications and Benchmarks

Medical knowledge graph completion via fusion of entity description and type information

Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services

Efficient Medical Knowledge Graph Embedding: Leveraging Adaptive Hierarchical Transformers and Model Compression

MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed

BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs

Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science

Universal Knowledge Graph Embeddings

Generating Biomedical Knowledge Graphs from Knowledge Bases, Registries, and Multiomic Data

Transformers and the Representation of Biomedical Background Knowledge

MedGraph: An experimental semantic information retrieval method using knowledge graph embedding for the biomedical citations indexed in PubMed

Biomedical Multi-hop Question Answering Using Knowledge Graph Embeddings and Language Models