Abstract:We consider a crowd-sourcing problem where in the process of labeling massive datasets, multiple labelers with unknown annotation quality must be selected to perform the labeling task for each incoming data sample or task, with the results aggregated using for example simple or weighted majority voting rule. In this paper we approach this labeler selection problem in an online learning framework, whereby the quality of the labeling outcome by a specific set of labelers is estimated so that the learning algorithm over time learns to use the most effective combinations of labelers. This type of online learning in some sense falls under the family of multi-armed bandit (MAB) problems, but with a distinct feature not commonly seen: since the data is unlabeled to begin with and the labelers' quality is unknown, their labeling outcome (or reward in the MAB context) cannot be directly verified; it can only be estimated against the crowd and known probabilistically. We design an efficient online algorithm LS_OL using a simple majority voting rule that can differentiate high- and low-quality labelers over time, and is shown to have a regret (w.r.t. always using the optimal set of labelers) of O(log 2 T) uniformly in time under mild assumptions on the collective quality of the crowd, thus regret free in the average sense. We discuss performance improvement by using a more sophisticated majority voting rule, and show how to detect and filter out "bad" (dishonest, malicious or very incompetent) labelers to further enhance the quality of crowd-sourcing. Extension to the case when a labeler's quality is task-type dependent is also discussed using techniques from the literature on continuous arms. We present numerical results using both simulation and a real dataset on a set of images labeled by Amazon Mechanic Turks (AMT).

Annotation Efficiency: Identifying Hard Samples via Blocked Sparse Linear Bandits

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

Blocked Collaborative Bandits: Online Collaborative Filtering with Per-Item Budget Constraints

Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

Fixed-Budget Best-Arm Identification in Sparse Linear Bandits

Optimizing Active Learning for Low Annotation Budgets

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Practical Obstacles to Deploying Active Learning

Robust Image Annotation Via Simultaneous Feature and Sample Outlier Pursuit

An Online Learning Approach to Improving the Quality of Crowd-Sourcing

Leveraging Offline Data in Linear Latent Bandits

Low-Rank Generalized Linear Bandit Problems

Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

Censored Semi-Bandits for Resource Allocation

Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model

Information Directed Sampling for Sparse Linear Bandits

Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data