Name Disambiguation Using Semantic Association Clustering

Hai Jin,Li Huang,Pingpeng Yuan
DOI: https://doi.org/10.1109/ICEBE.2009.16
2009-01-01
Abstract:Due to homonyms, abbreviations, etc., name ambiguity is widely available in Web and e-document. For example, when integrating heterogeneous literature databases, because there are different name specifications, different authors may be thought of as the same author, and vice versa. Therefore, name ambiguity makes data robust even dirty and lowers the precision of information retrieval. In this paper, we present an approach, named as semantic association based name disambiguation method (SAND), to solve person name ambiguity. The basic idea of SAND is to explore the semantic association of name entities and cluster name entities according to their associations. Finally, the name entities in the same group are considered as the same entities. We test SAND using data from CitesSeer, DBLP and Libra. The test results show that SAND is an effective approach to solve the problem of name ambiguity.
What problem does this paper attempt to address?