Multi-Label Truth Inference for Crowdsourcing Using Mixture Models.

Jing Zhang,Xindong Wu
DOI: https://doi.org/10.1109/tkde.2019.2951668
IF: 9.235
2019-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the quality of labels, requesters usually let one task be independently completed by a group of heterogeneous crowdsourced workers. Then, the true values of the multiple labels of each task are inferred from these repeated noisy labels. In this paper, we propose two novel probabilistic models MCMLI and MCMLD to address the multi-class multi-label inference problem in crowdsourcing. MCMLI assumes that the labels of each task are mutually independent and MCMLD utilizes a mixture of multiple independently multinoulli distributions to capture the correlation among the labels. Both models can jointly infer multiple true labels of each instance as well as estimate the reliability of crowdsourced workers modeled by a set of confusion matrices with an expectation-maximization algorithm. Experiments with three typical crowdsourcing scenarios and a real-world dataset show that our proposed models significantly outperform existing competitive alternatives. When the labels are strongly correlated, MCMLD substantially outperforms MCMLI. Furthermore, our models can be easily simplified to the one-coin models, which show more advantageous when errors are uniformly distributed, or labels are sparse.
What problem does this paper attempt to address?