Robust Crowdsourced Learning

Zhiquan Liu,Luo,Wu-Jun Li
DOI: https://doi.org/10.1109/bigdata.2013.6691593
2013-01-01
Abstract:In general, a large amount of labels are needed for supervised learning algorithms to achieve satisfactory performance. It's typically very time-consuming and money-consuming to get such kind of labeled data. Recently, crowdsourcing services provide an effective way to collect labeled data with much lower cost. Hence, crowdsourced learning (CL), which performs learning with labeled data collected from crowdsourcing services, has become a very hot and interesting research topic in recent years. Most existing CL methods exploit only the labels from different workers (annotators) for learning while ignoring the attributes of the instances. In many real applications, the attributes of the instances are actually the most discriminative information for learning. Hence, CL methods with attributes have attracted more and more attention from CL researchers. One representative model of such kind is the personal classifier (PC) model, which has achieved the state-of-the-art performance. However, the PC model makes an unreasonable assumption that all the workers contribute equally to the final classification. This contradicts the fact that different workers have different quality (ability) for data labeling. In this paper, we propose a novel model, called robust personal classifier (RPC), for robust crowdsourced learning. Our model can automatically learn an expertise score for each worker. This expertise score reflects the inherent quality of each worker. The final classifier of our RPC model gives high weights for good workers and low weights for poor workers or spammers, which is more reasonable than PC model with equal weights for all workers. Furthermore, the learned expertise score can be used to eliminate spammers or low-quality workers. Experiments on simulated datasets and UCI datasets show that the proposed model can dramatically outperform the baseline models such as PC model in terms of classification accuracy and ability to detect spammers.
What problem does this paper attempt to address?