Crowdsourcing Label Quality: A Theoretical Analysis

Wang Wei,Zhou Zhi-Hua
DOI: https://doi.org/10.1007/s11432-015-5391-x
2015-01-01
Abstract:Crowdsourcing has been an effective and efficient paradigm for providing labels for large-scale unlabeled data. In the past few years, many methods have been developed for inferring labels from the crowd, but few theoretical analyses have been presented to support this popular human-machine interaction process. In this paper, we theoretically study the quality of labels inferred from crowd workers by majority voting and provide an analysis of label quality that shows that the label error rate decreases exponentially with the number of workers selected for each task. We also study the problem of eliminating low-quality workers from the crowd, and provide a conservative condition for eliminating low-quality workers without eliminating any non-low-quality worker with high probability. We also provide an aggressive condition for eliminating all low-quality workers with high probability.
What problem does this paper attempt to address?