Improving Label Accuracy by Filtering Low-Quality Workers in Crowdsourcing.

Bryce Nicholson,Victor S. Sheng,Jing Zhang,Zhiheng Wang,Xuefeng Xian
DOI: https://doi.org/10.1007/978-3-319-27060-9_45
2015-01-01
Abstract:Filtering low-quality workers from data sets labeled via crowdsourcing is often necessary due to the presence of low quality workers, who either lack knowledge on corresponding subjects and thus contribute many incorrect labels to the data set, or intentionally label quickly and imprecisely in order to produce more labels in a short time period. We present two new filtering algorithms to remove low-quality workers, called Cluster Filtering (CF) and Dynamic Classification Filtering (DCF). Both methods can use any number of characteristics of workers as attributes for learning. CF separates workers using k-means clustering with 2 centroids, separating the workers into a high-quality cluster and a low-quality cluster. DCF uses a classifier of any kind to perform learning. It builds a model from a set of workers from other crowdsourced data sets and classifies the workers in the data set to filter. In theory, DCF can be trained to remove any proportion of the lowestquality workers. We compare the performance of DCF with two other filtering algorithms, one by Raykar and Yu (RY), and one by Ipeirotis et al. (IPW). Our results show that CF, the second-best filter, performs modestly but effectively, and that DCF, the best filter, performs much better than RY and IPW on average and on the majority of crowdsourced data sets.
What problem does this paper attempt to address?