Abstract:We consider a crowd-sourcing problem where in the process of labeling massive datasets, multiple labelers with unknown annotation quality must be selected to perform the labeling task for each incoming data sample or task, with the results aggregated using for example simple or weighted majority voting rule. In this paper we approach this labeler selection problem in an online learning framework, whereby the quality of the labeling outcome by a specific set of labelers is estimated so that the learning algorithm over time learns to use the most effective combinations of labelers. This type of online learning in some sense falls under the family of multi-armed bandit (MAB) problems, but with a distinct feature not commonly seen: since the data is unlabeled to begin with and the labelers' quality is unknown, their labeling outcome (or reward in the MAB context) cannot be directly verified; it can only be estimated against the crowd and known probabilistically. We design an efficient online algorithm LS_OL using a simple majority voting rule that can differentiate high- and low-quality labelers over time, and is shown to have a regret (w.r.t. always using the optimal set of labelers) of O(log 2 T) uniformly in time under mild assumptions on the collective quality of the crowd, thus regret free in the average sense. We discuss performance improvement by using a more sophisticated majority voting rule, and show how to detect and filter out "bad" (dishonest, malicious or very incompetent) labelers to further enhance the quality of crowd-sourcing. Extension to the case when a labeler's quality is task-type dependent is also discussed using techniques from the literature on continuous arms. We present numerical results using both simulation and a real dataset on a set of images labeled by Amazon Mechanic Turks (AMT).

Crowdsourcing subjective annotations using pairwise comparisons reduces bias and error compared to the majority-vote method

A Formalized Framework for Incorporating Expert Labels in Crowdsourcing Environment

Learning from Crowds under Experts' Supervision

Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels

An Online Learning Approach to Improving the Quality of Crowd-Sourcing

Deep Robust Subjective Visual Property Prediction in Crowdsourcing

Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?

Crowdsourcing Label Quality: A Theoretical Analysis

Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks

Majority Voting and Pairing with Multiple Noisy Labeling.

Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning

Crowdsourcing in the Absence of Ground Truth -- A Case Study

Distinguishing Question Subjectivity from Difficulty for Improved Crowdsourcing

End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment Analysis

Consensus Algorithms for Biased Labeling in Crowdsourcing.

Uncertainty-driven Sampling for Efficient Pairwise Comparison Subjective Assessment

Efficient Online Crowdsourcing with Complex Annotations

Listwise Approach For Rank Aggregation In Crowdsourcing

Dynamic Human Evaluation for Relative Model Comparisons

Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning Classification

Improving the Quality of Crowdsourced Image Labeling Via Label Similarity