Large Scale Name Disambiguation Using Rule-Based Post Processing Combined with Aminer

Lizhi Zhang,Zhijie Ban
DOI: https://doi.org/10.1007/978-981-32-9298-7_12
2019-01-01
Abstract:Author name ambiguity has long been viewed as a challenging problem in scientific literature management, and in recent years, due to the rapid increase in academic publications, author disambiguation on large-scale academic data has become an urgent problem. In this paper, we present a rule-based post processing method combined with Aminer’s framework to address the large scale author name disambiguation problem. In our method, we first introduce the Aminer’s disambiguation model for author name disambiguation. Based on the Aminer’s model, we propose an efficient post processing algorithm, aiming to improve the disambiguation performance by rule-based clustering. Our algorithm utilizes similarity features based on metadata information and implements two types of disambiguation rules. We carefully evaluate the proposed post processing method on real-world large data and experimental result shows that our method achieves clearly better performance (+11% in terms of F1-score) than the state-of-the-art Aminer [1] method.
What problem does this paper attempt to address?