Clustering Technique in Multi-Document Personal Name Disambiguation.

Chen Chen,Junfeng Hu,Houfeng Wang
DOI: https://doi.org/10.3115/1667884.1667897
2009-01-01
Abstract:Focusing on multi-document personal name disambiguation, this paper develops an agglomerative clustering approach to resolving this problem. We start from an analysis of point-wise mutual information between feature and the ambiguous name, which brings about a novel weight computing method for feature in clustering. Then a trade-off measure between within-cluster compactness and among-cluster separation is proposed for stopping clustering. After that, we apply a labeling method to find representative feature for each cluster. Finally, experiments are conducted on word-based clustering in Chinese dataset and the result shows a good effect.
What problem does this paper attempt to address?