Modeling for Noisy Labels of Crowd Workers.

Qian Yan,Hao Huang,Yunjun Gao,Chen Ying,Qingyang Hu,Tieyun Qian,Qinming He
DOI: https://doi.org/10.1007/978-3-319-45817-5_18
2016-01-01
Abstract:Crowdsourcing services can collect a large amount of labeled data at a low cost. Nonetheless, due to some influence factors such as the unqualified crowd workers and the controversiality of instances to be labeled, the collected labels often contain noisy data, i.e., they sometimes are randomly given, incorrect, or missing. Although approaches have been proposed to infer these influence factors to help better model the labeling results, the inferences are not guaranteed to reflect the true effects of the influence factors on the uncertainty and errors in the labels. In this paper, we propose to conduct probability fitting over the noisy labeled data with Bernoulli Mixture Model. Workers with similar behaviors correspond to a same Bernoulli component in the mixture model. The effects of influence factors are fused in the Bernoulli parameter of each Bernoulli component, which directly reflects the uncertainty of labels, and can help identify labeling errors, predict real labels, and reveal the behavior patterns of crowd workers. Experiments on both benchmark and real datasets verify the efficacy of our model.
What problem does this paper attempt to address?