Abstract:Recently, crowdsourcing has established itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can make mistakes with diverse expertise, one core learning task is to estimate each worker's expertise , and aggregate over them to infer the latent true labels . In this paper, we show that as one of the major research directions, the noise transition matrix based worker expertise modeling methods commonly overfit the annotation noise, either due to the oversimplified noise assumption or inaccurate estimation. To solve this problem, we propose a knowledge distillation framework (KD-Crowd) by combining the complementary strength of noise-model-free robust learning techniques and transition matrix based worker expertise modeling. The framework consists of two stages: in Stage 1, a noise-model-free robust student model is trained by treating the prediction of a transition matrix based crowdsourcing teacher model as noisy labels, aiming at correcting the teacher's mistakes and obtaining better true label predictions; in Stage 2, we switch their roles, retraining a better crowdsourcing model using the crowds' annotations supervised by the refined true label predictions given by Stage 1. Additionally, we propose one f-mutual information gain ( MIG f ) based knowledge distillation loss, which finds the maximum information intersection between the student's and teacher's prediction. We show in experiments that MIG f achieves obvious improvements compared to the regular KL divergence knowledge distillation loss, which tends to force the student to memorize all information of the teacher's prediction, including its errors. We conduct extensive experiments showing that, as a universal framework, KD-Crowd substantially improves previous crowdsourcing methods on true label prediction and worker expertise estimation.

An Expert Validation Framework For Improving The Quality Of Crowdsourced Clustering

A Formalized Framework for Incorporating Expert Labels in Crowdsourcing Environment

Learning from Crowds under Experts' Supervision

Task Assignment with Guaranteed Quality for Crowdsourcing Platforms.

Improving Learning-from-Crowds Through Expert Validation.

An Interactive Method to Improve Crowdsourced Annotations

Human-centred Design on Crowdsourcing Annotation Towards Improving Active Learning Model Performance

CVAP: Validation for Cluster Analyses

An Improved Supervoxel Clustering Algorithm of 3D Point Clouds for the Localization of Industrial Robots

Cross-Validation Approach to Evaluate Clustering Algorithms: An Experimental Study Using Multi-Label Datasets

Cleaning Uncertain Data with Crowdsourcing - a General Model with Diverse Accuracy Rates

Crowdsourcing Label Quality: A Theoretical Analysis

Improving IoT Data Quality in Mobile Crowd Sensing: A Cross Validation Approach

Hierarchical Crowdsourcing for Data Labeling with Heterogeneous Crowd.

Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning Classification

Leveraging Attributes And Crowdsourcing For Join

Icrowd: An Adaptive Crowdsourcing Framework

KD-Crowd: a knowledge distillation framework for learning from crowds

A visual analysis system and method for improving the quality of crowdsourcing annotated data

Learning from Crowds with Annotation Reliability

Data Quality in Crowdsourcing and Spamming Behavior Detection