AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

Rui Zhang,Yixin Su,Bayu Distiawan Trisedya,Xiaoyan Zhao,Min Yang,Hong Cheng,Jianzhong Qi
2023-11-13
Abstract:The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automatic alignment method named AutoAlign, which does not require any manually crafted seed alignments. Specifically, for predicate embeddings, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs. For entity embeddings, AutoAlign first computes the entity embeddings of each KG independently using TransE, and then shifts the two KGs' entity embeddings into the same vector space by computing the similarity between entities based on their attributes. Thus, both predicate alignment and entity alignment can be done without manually crafted seed alignments. AutoAlign is not only fully automatic, but also highly effective. Experiments using real-world KGs show that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods.
Information Retrieval,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of entity alignment between Knowledge Graphs (KGs). Specifically, the goal of the paper is to develop a fully automated entity alignment method, namely AutoAlign, which does not require any manually labeled seed alignment data. Current methods typically rely on a large amount of manually labeled data to generate seed alignments, which is not only time-consuming and costly but also lacks portability across different tasks. The paper presents two main contributions: 1. **Automatic Predicate Alignment**: By constructing a predicate-proximity-graph and utilizing large language models (such as ChatGPT), the method automatically captures the similarity between predicates in two knowledge graphs. 2. **Automatic Entity Alignment**: By computing entity embeddings in each knowledge graph and transferring the entity embeddings from both graphs into the same vector space based on attribute similarity. These techniques enable AutoAlign to perform entity and predicate alignment fully automatically, without relying on any manually labeled data. Experimental results show that AutoAlign significantly improves performance on the entity alignment task compared to existing methods that require manual labeling.