Incentive-Based Entity Collection Using Crowdsourcing

Chengliang Chai,Ju Fan,Guoliang Li
DOI: https://doi.org/10.1109/icde.2018.00039
2018-01-01
Abstract:Crowdsourced entity collection leverages human's ability to collect entities that are missing in a database, which has many real-world applications, such as knowledge base enrichment and enterprise data collection. There are several challenges. First, it is hard to evaluate the workers' quality because a worker's quality depends on not only the correctness of her provided entities but also the distinctness of these entities compared with the collected ones by other workers. Second, crowd workers are likely to provide popular entities and different workers will provide many duplicated entities, leading to a waste of money and low coverage. To address these challenges, we propose an incentive-based crowdsourced entity collection framework CrowdEC that encourages workers to provide more distinct items using an incentive strategy. CrowdEC has fundamental differences from existing crowdsourcing collection methods. One the one hand, CrowdEC proposes a worker model and evaluates a worker's quality based on cross validation and entity checking. CrowdEC devises a worker utility model that considers both worker's quality and entities' distinctness provided by workers. CrowdEC proposes a worker elimination method to block workers with a low utility, which solves the first challenge. On the other hand, CrowdEC proposes an incentive pricing technique that encourages each qualified (i.e., non-eliminated) worker to provide distinct entities rather than duplicates. CrowdEC provides two types of tasks and judiciously assigns workers with appropriate tasks to address the second challenge. We have conducted both real and simulated experiments, and the results show that CrowdEC outperforms existing state-of-the-art works on both cost and quality.
What problem does this paper attempt to address?