Eif: A Framework Of Effective Entity Identification

Lingli Li,Hongzhi Wang,Hong Gao,Jianzhong Li
DOI: https://doi.org/10.1007/978-3-642-14246-8_68
2010-01-01
Abstract:Entity identification, that is to build corresponding relationships between objects and entities in dirty data, plays an important role in data cleaning. The confusion between entities and their names often results in dirty data. That is, different entities may share the identical name and different names may correspond to the identical entity. Therefore, the major task of entity identification is to distinguish entities sharing the same name and recognize different names referring to the same entity. However, current research focuses on only one aspect and cannot solve the problem completely. To address this problem, in this paper. EIF, a framework of entity identification with the consideration of the both kinds of confusions, is proposed. With effective clustering techniques, approximate string matching algorithms and a flexible mechanism of knowledge integration, EIF can be widely used to solve many different kinds of entity identification problems. In this paper, as an application of Ell:, we solved the author identification problem. The effectiveness of this framework is verified by extensive experiments.
What problem does this paper attempt to address?