Name disambiguation using many-to-one features

Yang Liu,Jing He,Conglei Yao
2007-01-01
Journal of Computational Information Systems
Abstract:As the Web increase drastically, more and more entity information come to appear on Web, including their profile information, their web log containing their idea, activity, speech and so on. However, there are many entities sharing same names. Such entities include persons, locations and so on. This paper presents an approach to estimate the number of entities sharing same name by employing many-to-one features. The basic idea is that the entities are not likely to share all other features even if they have the same name. We list some strategies for selecting key features, present an approach to extract the features on Web, and combine them to estimate the entity number. What is more, we also give a method to identify the fake information which will confuse us and filter them.
What problem does this paper attempt to address?