Unlocking the Power of Large Language Models for Entity Alignment

Xuhui Jiang,Yinghan Shen,Zhichao Shi,Chengjin Xu,Wei Li,Zixuan Li,Jian Guo,Huawei Shen,Yuanzhuo Wang
2024-10-09
Abstract:Entity Alignment (EA) is vital for integrating diverse knowledge graph (KG) data, playing a crucial role in data-driven AI applications. Traditional EA methods primarily rely on comparing entity embeddings, but their effectiveness is constrained by the limited input KG data and the capabilities of the representation learning techniques. Against this backdrop, we introduce ChatEA, an innovative framework that incorporates large language models (LLMs) to improve EA. To address the constraints of limited input KG data, ChatEA introduces a KG-code translation module that translates KG structures into a format understandable by LLMs, thereby allowing LLMs to utilize their extensive background knowledge to improve EA accuracy. To overcome the over-reliance on entity embedding comparisons, ChatEA implements a two-stage EA strategy that capitalizes on LLMs' capability for multi-step reasoning in a dialogue format, thereby enhancing accuracy while preserving efficiency. Our experimental results verify ChatEA's superior performance, highlighting LLMs' potential in facilitating EA tasks.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the limitations of existing methods in the entity alignment (EA) task. Specifically, traditional EA methods mainly rely on comparing entity embeddings, and these methods face the following challenges when dealing with highly heterogeneous knowledge graphs (KGs): 1. **Limited input KG data**: Traditional methods only rely on the input KG data and lack sufficient context information, resulting in poor performance in complex and diverse KGs. 2. **Over - reliance on entity embedding comparison**: Traditional methods usually determine similarity by directly comparing entity embeddings. This approach lacks transparency and a detailed reasoning process, especially when it is difficult to capture the complex relationships between entities when facing complex KG pairs. To solve these problems, the author introduces the **ChatEA** framework, which takes advantage of large - scale language models (LLMs) to improve the EA task. The main contributions of ChatEA are as follows: - **Introducing the KG - Code translation module**: Convert the KG structure into a format understandable by LLMs, enabling LLMs to use their rich background knowledge to enhance the accuracy of EA. - **Adopting a two - stage EA strategy**: Combine the multi - step reasoning ability of LLMs to perform reasoning and re - evaluation in a conversational form, improving accuracy and transparency while maintaining efficiency. Through these innovations, ChatEA not only overcomes the limitations of traditional methods but also shows significantly better performance than the existing state - of - the - art methods on multiple benchmark datasets. ### Formula representation During the description, some formulas and symbols involved can be presented in Markdown format, for example: - Definition of knowledge graph (KG): \( \text{KG}=(E, R, F) \), where \( E \) is the set of entities, \( R \) is the set of relations, and \( F \) is the set of facts. - The goal of the entity alignment task: Given two KGs, \( \text{KG}_1=(E_1, R_1, F_1) \) and \( \text{KG}_2=(E_2, R_2, F_2) \), the goal is to determine the same set of entities \( S =\{(e_i, e_j)|e_i\in E_1, e_j\in E_2\} \). These formulas and symbols ensure the professionalism and accuracy of the content.