Detection of Entity Mixture in Knowledge Bases Using Hierarchical Clustering.

Haihua Xie,Xiaoqing Lu,Zhi Tang,Xiaojun Huang
DOI: https://doi.org/10.1007/978-3-319-50496-4_24
2016-01-01
Abstract:Entity mixture in a knowledge base refers to the situation that some attributes of an entity are mistaken for another entity's, and it often occurs among homonymous entities which have the same value of the attribute "Name". Elimination of entity mixture is critical to ensure data accuracy and validity for knowledge based services. However, current researches on entity disambiguation mainly focuses on determining the identity of entities mentioned in text during information extraction for building a knowledge base, while little work has been done to verify the information in a built knowledge base. In this paper, we propose a generic method to detect mixed homonymous entities in a knowledge base using hierarchical clustering. The principle of our methodology to differentiate entities is detecting the inconsistence of their attributes based on analysis of the appearance distribution of their attribute values in documents of a common corpus. Experiments on a data set of industry applications have been conducted to demonstrate the workflow of performing the clustering and detecting mixed entities in a knowledge base using our methodology.
What problem does this paper attempt to address?