Universal Knowledge Graph Embeddings

N'Dah Jean Kouagou,Caglar Demir,Hamada M. Zahera,Adrian Wilke,Stefan Heindorf,Jiayi Li,Axel-Cyrille Ngonga Ngomo
DOI: https://doi.org/10.1145/3589335.3651978
2024-07-05
Abstract:A variety of knowledge graph embedding approaches have been developed. Most of them obtain embeddings by learning the structure of the knowledge graph within a link prediction setting. As a result, the embeddings reflect only the structure of a single knowledge graph, and embeddings for different knowledge graphs are not aligned, e.g., they cannot be used to find similar entities across knowledge graphs via nearest neighbor search. However, knowledge graph embedding applications such as entity disambiguation require a more global representation, i.e., a representation that is valid across multiple sources. We propose to learn universal knowledge graph embeddings from large-scale interlinked knowledge sources. To this end, we fuse large knowledge graphs based on the owl:sameAs relation such that every entity is represented by a unique identity. We instantiate our idea by computing universal embeddings based on DBpedia and Wikidata yielding embeddings for about 180 million entities, 15 thousand relations, and 1.2 billion triples. We believe our computed embeddings will support the emerging field of graph foundation models. Moreover, we develop a convenient API to provide embeddings as a service. Experiments on link prediction suggest that universal knowledge graph embeddings encode better semantics compared to embeddings computed on a single knowledge graph. For reproducibility purposes, we provide our source code and datasets open access.
Artificial Intelligence
What problem does this paper attempt to address?
This paper mainly discusses how to solve the problem of incompatibility between different knowledge graphs in the Knowledge Graph Embedding (KGE) models. Existing KGE methods usually only focus on the structure of a single knowledge graph, but in practical applications, it is necessary to integrate global representations of multiple sources of information. To this end, the authors propose a method to learn universal knowledge graph embeddings by fusing knowledge graphs from large-scale interlinked knowledge sources and mapping all entities to unique IDs through the "owl:sameAs" relationship, creating a unified embedding space. In the implementation, they merge multiple knowledge graphs into a single knowledge graph and assign unique IDs to each matched entity, thereby reducing memory consumption and computational costs, as well as addressing the issue of knowledge graph incompleteness. The authors evaluate this method using four different KGE models (DistMult, ComplEx, QM ult, and ConEx), and the results show that universal knowledge graph embeddings perform better than embeddings of individual knowledge graphs, particularly in the ConEx model, in the link prediction task. In addition, the authors have developed a convenient API to provide these embeddings as a service and have open-sourced their code and datasets to support reproducible research. The paper also mentions some related work, including multilingual knowledge graph embedding, bootstrapping strategies based on matching scores, and methods utilizing additional entity attributes. In summary, this paper attempts to address the problem of how to create and learn universal entity embeddings that can integrate multiple knowledge graph information, in order to improve the performance and practicality of knowledge graph applications.