Abstract:The important task of correcting label noise is addressed infrequently in literature. The difficulty of developing a robust label correction algorithm leads to this silence concerning label correction. To break the silence, we propose two algorithms to correct label noise. One utilizes self-training to re-label noise, called Self-Training Correction (STC). Another is a clustering-based method, which groups instances together to infer their ground-truth labels, called Cluster-based Correction (CC). We also adapt an algorithm from previous work, a consensus-based method called Polishing that consults with an ensemble of classifiers to change the values of attributes and labels. We simplify Polishing such that it only alters labels of instances, and call it Polishing Labels (PL). We experimentally compare our novel methods with Polishing Labels by examining their improvements on the label qualities, model qualities, and AUC metrics of binary and multi-class data sets under different noise levels. Our experimental results demonstrate that CC significantly improves label qualities, model qualities, and AUC metrics consistently. We further investigate how these three noise correction algorithms improve the data quality, in terms of label accuracy, in the context of image labeling in crowdsourcing. First, we look at three consensus methods for inferring a ground-truth label from the multiple noisy labels obtained from crowdsourcing, i.e., Majority Voting (MV), Dawid Skene (DS), and KOS. We then apply the three noise correction methods to correct labels inferred by these consensus methods. Our experimental results show that the noise correction methods improve the labeling quality significantly. As an overall result of our experiments, we conclude that CC performs the best. Our research has illustrated the viability of implementing noise correction as another line of defense against labeling error, especially in a crowdsourcing setting. Furthermore, it presents the feasibility of the automation of an otherwise manual process of analyzing a data set, and correcting and cleaning the instances, an expensive and time-consuming task. (C) 2016 Elsevier Ltd. All rights reserved.

Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing.

Recovering Missing Labels of Crowdsourcing Workers.

A Formalized Framework for Incorporating Expert Labels in Crowdsourcing Environment

Learning from Crowds under Experts' Supervision

Crowdsourcing Label Quality: A Theoretical Analysis

Label Noise Correction for Crowdsourcing Using Dynamic Resampling

Effective Solution for Labeling Candidates with a Proper Ration for Efficient Crowdsourcing

Improving Crowdsourced Label Quality Using Noise Correction.

Label Noise Correction and Application in Crowdsourcing

Data Quality in Crowdsourcing and Spamming Behavior Detection

Crowdsourced Label Aggregation Using Bilayer Collaborative Clustering

False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking

An Expert Validation Framework For Improving The Quality Of Crowdsourced Clustering

Improving the Quality of Crowdsourced Image Labeling Via Label Similarity

Icrowd: An Adaptive Crowdsourcing Framework

Noise Correction of Image Labeling in Crowdsourcing

Obtaining High-Quality Label by Distinguishing Between Easy and Hard Items in Crowdsourcing

Hierarchical Crowdsourcing for Data Labeling with Heterogeneous Crowd.

CDAS: A Crowdsourcing Data Analytics System

Label Aggregation with Clustering for Biased Crowdsourced Labeling.

Robust Sparse Weighted Classification for Crowdsourcing