Hike: A Hybrid Human-Machine Method for Entity Alignment in Large-Scale Knowledge Bases.

Yan Zhuang,Guoliang Li,Zhuojian Zhong,Jianhua Feng
DOI: https://doi.org/10.1145/3132847.3132912
2017-01-01
Abstract:With the vigorous development of the World Wide Web, many large-scale knowledge bases (KBs) have been generated. To improve the coverage of KBs, an important task is to integrate the heterogeneous KBs. Several automatic alignment methods have been proposed which achieve considerable success. However, due to the inconsistency and uncertainty of large-scale KBs, automatic techniques for KBs alignment achieve low quality (especially recall). Thanks to the open crowdsourcing platforms, we can harness the crowd to improve the alignment quality. To achieve this goal, in this paper we propose a novel hybrid human-machine framework for large-scale KB integration. We rst partition the entities of different KBs into many smaller blocks based on their relations. We then construct a partial order on these partitions and develop an inference model which crowdsources a set of tasks to the crowd and infers the answers of other tasks based on the crowdsourced tasks. Next we formulate the question selection problem, which, given a monetary budget B, selects B crowdsourced tasks to maximize the number of inferred tasks. We prove that this problem is NP-hard and propose greedy algorithms to address this problem with an approximation ratio of 1--1/e. Our experiments on real-world datasets indicate that our method improves the quality and outperforms state-of-the-art approaches.
What problem does this paper attempt to address?