Abstract:The problem of crowdsourced entity collection solicits people (a.k.a. workers) to complete missing data in a database and has witnessed many applications in knowledge base completion and enterprise data collection. Although previous studies have attempted to address the "open world" challenge of crowdsourced entity collection, they do not pay much attention to the "distribution" of the collected entities. Evidently, in many real applications, users may have distribution requirements on the collected entities, e.g., even spatial distribution when collecting points-of-interest. In this paper, we study a new research problem, distribution-aware crowdsourced entity collection (CROWDDEC): Given an expected distribution w.r.t. an attribute (e.g., region or year), it aims to collect a set of entities via crowdsourcing and minimize the difference of the entity distribution from the expected distribution. Due to the openness of crowdsourcing, the CROWDDEC problem calls for effective crowdsourcing quality control. We propose an adaptive worker selection approach to address this problem. The approach estimates underlying entity distribution of workers on-the-fly based on the collected entities. Then, it adaptively selects the best set of workers that minimizes the difference from the expected distribution. Once workers submit their answers, it adjusts the estimation of workers' underlying distributions for subsequent adaptive worker selections. We prove the hardness of the problem, and develop effective estimation techniques as well as efficient worker selection algorithms to support this approach. We deployed the proposed approach on Amazon Mechanical Turk and the experimental results on two real datasets show that the approach achieves superiority on both effectiveness and efficiency.

Crowd-Selection Query Processing in Crowdsourcing Databases: A Task-Driven Approach.

Task Assignment with Guaranteed Quality for Crowdsourcing Platforms.

Matchmaker: Stable Task Assignment with Bounded Constraints for Crowdsourcing Platforms

Distribution-Aware Crowdsourced Entity Collection

Crowdsourcing Database Systems: Overview and Challenges

DOCS: Domain-Aware Crowdsourcing System.

A Transfer Learning Based Framework Of Crowd-Selection On Twitter

Crowdsourcing with Multiple-Source Knowledge Transfer

Treating Crowdsourcing as Examination: How to Score Tasks and Online Workers?

Cleaning Uncertain Data with Crowdsourcing - a General Model with Diverse Accuracy Rates

Multicategory Crowdsourcing Accounting for Plurality in Worker Skill and Intention, Task Difficulty, and Task Heterogeneity

DOCS: a domain-aware crowdsourcing system using knowledge bases

T-Crowd: Effective Crowdsourcing for Tabular Data

Crowdsourced Data Management: A Survey.

Similarity-driven and task-driven models for diversity of opinion in crowdsourcing markets

Quality-Assured Synchronized Task Assignment in Crowdsourcing

Dynamic Task Allocation for Crowdsourcing Settings

Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation

Icrowd: An Adaptive Crowdsourcing Framework

CDB: Optimizing Queries with Crowd-Based Selections and Joins.

Answering Skyline Queries over Incomplete Data with Crowdsourcing