Towards alias detection without string similarity: an active learning based approach.

Lili Jiang,Jianyong Wang,Ping Luo,Ning An,Min Wang
DOI: https://doi.org/10.1145/2348283.2348516
2012-01-01
Abstract:Entity aliases commonly exist and accurately detecting these aliases plays a vital role in various applications. In this paper, we use an active-learning-based method to detect aliases without string similarity. To minimize the cost on pairwise comparison, a subset-based method restricts the alias selection within a small-scale entity set. Within each generated entity set, an active learning based logistic regression classifier is employed to predict whether a candidate is the alias of a given entity. The experimental results on three datasets clearly demonstrate that our proposed approach can effectively detect this kind of entity aliases.
What problem does this paper attempt to address?