Improving the Quality of Crowdsourcing Labels by Combination of Golden Data and Incentive

Peijun Yang,Haibin Cai,Zhiming Zheng
DOI: https://doi.org/10.1109/icasid.2018.8693123
2018-01-01
Abstract:The rapid rise of deep learning and AI is inseparable from the support of massive labeled data. Crowdsourcing has become a cheap and efficient paradigm for providing labels for large-scale unlabeled data. But, due to the various uncertainty of crowdsourcing workers (or called labelers), much low-quality and false labeled data is yielded. To address this fundamental challenge, many redundancy-based ground truth inference algorithms have been proposed in the past few years, which assign each labeling task to multiple workers and infer the true label of each instance in task from its multiple label set. In this paper, we devise a novel scheme to improve the quality of labeled data and infer the truth label, which utilizes small proportion golden data that has been labeled correctly to estimate workers' ability and reliability and uses the incentive mechanism to motivate workers to do their best. Through experiments, we demonstrate that our method is effective and is also robust to low-quality workers as it outperforms Majority Voting (MV) and some commonly used algorithms.
What problem does this paper attempt to address?