OAG: Linking Entities across Large-scale Heterogeneous Knowledge Graphs
Fanjin Zhang,Xiao Liu,Jie Tang,Yuxiao Dong,Peiran Yao,Jie Zhang,Xiaotao Gu,Yan Wang,Evgeny Kharlamov,Bin Shao,Rui Li,Kuansan Wang
DOI: https://doi.org/10.1109/tkde.2022.3222168
IF: 9.235
2022-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Different knowledge graphs for the same domain are often uniquely housed on the Web. Effectively linking entities from different graphs is critical for building an open and comprehensive knowledge graph. However, linking entities across different sources has thus far faced various challenges, including the increasingly large-scale volume of the data, the heterogeneity of the graphs, and the ambiguity of real-world entities. To address them, we propose a unified framework LinKG. Specifically, we decouple the problem into different linking tasks based on the unique properties of each type of entity. To link word sequence based entities, we propose an LSTM-based method to capture word dependencies. To link entities of large scale, we utilize the hashing technique and convolutional neural networks for scalable and accurate linking. To link ambiguous entities, we propose heterogeneous graph attention networks to leverage heterogeneous structural information. Finally, to validate the design choices of different LinKG modules, we characterize the relationships between different tasks based on the single-domain and multi-domain transfer models. Extensive experiments demonstrate the effectiveness of LinKG with an overall F1-score of 95.15%, based on which we deploy and release the Open Academic Graph (OAG)—the largest publicly available heterogeneous academic graph to date.
computer science, information systems, artificial intelligence,engineering, electrical & electronic