Multi-document Chinese Name Disambiguation Based on Latent Semantic Analysis

Chengrong Wu,Linghui Gong,Jianping Zeng
DOI: https://doi.org/10.1109/fskd.2010.5569867
2010-01-01
Abstract:Name disambiguation has received considerable attention as an important subtask of NLP (Natural Language Processing). Given many potential references of person entities, the goal is to find out for each reference involved in the context the most possible person entity it refers to. However, many researches in this field either focus on name disambiguation within a single text or employ machine learning models on multi-document without any consideration of semantics. In this paper we propose a new algorithm based on LSA (Latent Semantic Analysis) for the multi-document disambiguation task for Chinese name. The method employs SVD (Singular Value Decomposition) to reduce the original high dimensional text space to comparatively lower dimensional semantic space and then cluster possible reference words on the semantic space to get the result. Experiments on a real world dataset which is collected from a BBS site show that the proposed method can generate reasonable result.
What problem does this paper attempt to address?