MORE: Toward Improving Author Name Disambiguation in Academic Knowledge Graphs
Jibing Gong,Xiaohan Fang,Jiquan Peng,Yi Zhao,Jinye Zhao,Chenlong Wang,Yangyang Li,Jingyi Zhang,Steve Drew
DOI: https://doi.org/10.1007/s13042-022-01686-5
2022-11-28
International Journal of Machine Learning and Cybernetics
Abstract:Author name disambiguation (AND) is a fundamental task in knowledge alignment for building a knowledge graph network or an online academic search system. Existing AND algorithms tend to cause over-splitting and over-merging problems of papers, severely jeopardizing the performance of downstream tasks. In this paper, we demonstrate the problem of paper over-splitting and over-merging when constructing an academic knowledge graph. To address the problems, we systematically investigate and propose a unified architecture, MORE, which utilizes LightGBM and HAC FOR paper clusteRing as well as HGAT for both cluster alignmEnt and knowledge graph representation learning. Specifically, we first propose a novel representation learning method which leverages OAG-BERT to learn paper entity embedding and utilizes SimCSE to regularizes pre-trained embedding anisotropic space. We then apply LightGBM to calculate the similarity matrix of papers through entity embedding. We also use hierarchical agglomerative clustering (HAC) for grouping clusters to alleviate over-merging. Finally, considering co-author relationships, we improve the HGAT model using hard-cross graph attention mechanism to generate semantic and structural embedding. Experimental results on two large real-world datasets show that our proposed method achieves 6%∼documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$sim$$end{document}16% improvement against the baseline models on F1-score.
computer science, artificial intelligence