Crowdsourcing Truth Inference Based on Label Confidence Clustering
Gongqing Wu,Liangzhu Zhou,Jiazhu Xia,Lei Li,Xianyu Bao,Xindong Wu
DOI: https://doi.org/10.1145/3556545
IF: 4.157
2023-02-24
ACM Transactions on Knowledge Discovery from Data
Abstract:Truth inference can help solve some difficult problems of data integration in crowdsourcing. Crowdsourced workers are not experts and their labeling ability varies greatly; therefore, in practical applications, it is difficult to determine whether the labels collected from a crowdsourcing platform are correct. This article proposes a novel algorithm called truth inference based on label confidence clustering (TILCC) to improve the quality of integrated labels for the single-choice classification problem in crowdsourcing labeling tasks. We obtain the label confidence via worker reliability, which is calculated from multiple noise labels using a truth discovery method, and then we generate the clustering features and use the K-means algorithm to cluster all the tasks into K different clusters. Each cluster corresponds to a specific class, and the tasks in the cluster are assigned a label. Compared with the performances of six state-of-the-art methods, MV, ZenCrowd, PM, CATD, GLAD, and GTIC, on 12 randomly selected real-world datasets, the performance of our algorithm showed many advantages: no need to set complex parameters, faster running speed, and significantly higher accuracy.
computer science, information systems, software engineering