Chinese multi-document personal name disambiguation

Houfeng Wang,Zheng Mei
2005-01-01
Abstract:This paper presents a new approach to determining whether an interested personal name across documents refers to the same entity. Firstly, three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs in the text, the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomerative clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on Wang Gang corpus.
What problem does this paper attempt to address?