On Entity Alignment at Scale

Zeng Weixin,Zhao Xiang,Li Xinyi,Tang Jiuyang,Wang Wei
DOI: https://doi.org/10.1007/s00778-021-00703-3
2022-01-01
The VLDB Journal
Abstract:Knowledge graph (KG), as an effective approach of organizing and storing data, has received growing attention over the last decade. A KG can hardly reach completeness since there are always a large amount of new data emerging. To increase the scale and coverage of KGs, a possible solution is to incorporate data from other KGs, and entity alignment (EA) plays a vital role during this process. EA is the task of detecting the entities that refer to the same real-world object but come from different KGs. Although a pile of approaches have been put forward to tackle this task, they are mostly evaluated on datasets in small size and cannot deal with large-scale data in practice. In this work, we study the task of EA at scale and put forward a novel solution that can manage large-scale KG pairs and meanwhile achieve promising alignment performance. First, we devise seed-oriented graph partition strategies to divide large-scale KG pairs into smaller subgraph pairs. Next, within each subgraph pair, we learn the unified entity representations using existing methods and conceive a novel reciprocal alignment inference strategy to model the bi-directional alignment interactions, which can lead to more accurate alignment results. To further improve the scalability of reciprocal alignment inference, we put forward two variant strategies that can significantly reduce the memory and time costs at the expense of a small drop of effectiveness. Our proposal is generic and can be applied to existing representation learning-based EA models to improve their capability of dealing with large-scale KG pairs. Finally, we build a new EA dataset with millions of entities and conduct detailed experiments to validate that our proposed model can effectively cope with EA at scale. We also evaluate our proposed model against state-of-the-art baselines on popular EA datasets, and the extensive experiments demonstrate its effectiveness and superiority.
What problem does this paper attempt to address?