Exploring Effectiveness of Inter-Microtask Qualification Tests in Crowdsourcing

Masaya Morinaga,Susumu Saito,Teppei Nakano,Tetsunori Kobayashi,Tetsuji Ogawa
DOI: https://doi.org/10.48550/arXiv.2012.10999
2020-12-21
Abstract:Qualification tests in crowdsourcing are often used to pre-filter workers by measuring their ability in executing <a class="link-external link-http" href="http://microtasks.While" rel="external noopener nofollow">this http URL</a> creating qualification tests for each task type is considered as a common and reasonable way, this study investigates into its worker-filtering performance when the same qualification test is used across multiple types of <a class="link-external link-http" href="http://tasks.On" rel="external noopener nofollow">this http URL</a> Amazon Mechanical Turk, we tested the annotation accuracy in six different cases where tasks consisted of two different difficulty levels, arising from the identical real-world domain: four combinatory cases in which the qualification test and the actual task were the same or different from each other, as well as two other cases where workers with Masters Qualification were asked to perform the actual task <a class="link-external link-http" href="http://only.The" rel="external noopener nofollow">this http URL</a> experimental results demonstrated the two following findings: i) Workers that were assigned to a difficult qualification test scored better annotation accuracy regardless of the difficulty of the actual task; ii) Workers with Masters Qualification scored better annotation accuracy on the low-difficulty task, but were not as accurate as those who passed a qualification test on the high-difficulty task.
Human-Computer Interaction
What problem does this paper attempt to address?