Abstract:Crowdsourced selection asks the crowd to select entities that satisfy a query condition, e.g., selecting the photos of people wearing sunglasses from a given set of photos. Existing studies focus on a single query predicate and in this paper we study the crowdsourced selection problem on multi-attribute data, e.g., selecting the female photos with dark eyes and wearing sunglasses. A straightforward method asks the crowd to answer every entity by checking every predicate in the query. Obviously, this method involves huge monetary cost. Instead, we can select an optimized predicate order and ask the crowd to answer the entities following the order. Since if an entity does not satisfy a predicate, we can prune this entity without needing to ask other predicates and thus this method can reduce the cost. There are two challenges in finding the optimized predicate order. The first is how to detect the predicate order and the second is to capture correlation among different predicates. To address this problem, we propose predicate order based framework to reduce monetary cost. Firstly, we define an expectation tree to store selectivities on predicates and estimate the best predicate order. In each iteration, we estimate the best predicate order from the expectation tree, and then choose a predicate as a question to ask the crowd. After getting the result of the current predicate, we choose next predicate to ask until we get the result. We will update the expectation tree using the answer obtained from the crowd and continue to the next iteration. We also study the problem of answering multiple queries simultaneously, and reduce its cost using the correlation between queries. Finally, we propose a confidence based method to improve the quality. The experiment result shows that our predicate order based algorithm is effective and can reduce cost significantly compared with baseline approaches.

CrowdGather: Entity Extraction over Structured Domains

Distribution-Aware Crowdsourced Entity Collection

CrowdER: crowdsourcing entity resolution

Entity-Relation Extraction As Multi-Turn Question Answering

Incentive-Based Entity Collection Using Crowdsourcing

Crowdsourced Collective Entity Resolution with Relational Match Propagation

Hybrid Entity Clustering Using Crowds and Data

Real-time On-Demand Crowd-powered Entity Extraction

Web-scale extraction of structured data

A Partial-Order-based Framework for Cost-Effective Crowdsourced Entity Resolution

T-Crowd: Effective Crowdsourcing for Tabular Data

An Entropy-based Approach to the Crowd Entity Resolution

Crowdsourced Data Management: A Survey.

Data-Efficient Information Extraction from Form-Like Documents

Pushing the Boundaries of Crowd-enabled Databases with Query-driven Schema Expansion

Entity Extraction with Knowledge from Web Scale Corpora

Crowdsourced Selection on Multi-Attribute Data

Crowdsourcing Information Extraction for Biomedical Systematic Reviews

Techniques for Jointly Extracting Entities and Relations: A Survey

Crowdsourcing Ground Truth for Medical Relation Extraction

Crowdsourced Entity Alignment: A Decision Theory Based Approach