Research on Data Quality Control of Crowdsourcing Annotation: A Survey

Jian Lu,Wei Li,Qingren Wang,Yiwen Zhang
DOI: https://doi.org/10.1109/dasc-picom-cbdcom-cyberscitech49142.2020.00044
2020-01-01
Abstract:It is well known that many intelligent and computer-hard tasks cannot be effectively addressed by existing machine-based approaches, so that it is nature to think of utilizing the intelligence of human being. With the popularization of crowdsourcing concepts as well as the development of crowdsourcing platforms, as a new way of human intelligence to participate in machine computing, crowdsourcing annotation helps more and more supervised-learning-based approaches easily obtain enormous labeled data with relatively low cost. However, because of the diversity of the crowd employed by crowdsourcing platforms, how to control qualities of labels coming from the crowd plays a key role in crowdsourcing annotation. In this survey, we first present basic concepts and definitions of crowdsourcing annotation. Then, we review existing ground truth inference algorithms and learning models. After that, the advantages and distinctions among these algorithms and learning models as well as the levels of study progresses will be reported. And finally, we summarize realworld datasets widely utilized in the field of crowdsourcing annotation as well as available open source tools.
What problem does this paper attempt to address?