Unsupervised Ensemble Learning with Noisy Label Correction

Xupeng Zou,Zhongnan Zhang,Zhen He,Liang Shi
DOI: https://doi.org/10.1145/3404835.3463081
2021-01-01
Abstract:Unsupervised ensemble learning aims to estimate ground-truth labels via integrating noisy and unreliable labeling results from multiple annotators. Although many techniques have been proposed to deal with this challenging task, there still exists some "tough" instances with noisy labels that are misclassified after the integration, which significantly affect the classification performance. This paper introduces a novel approach to improve the label accuracy based on unsupervised ensemble learning. First, we apply the expectation maximization (EM) algorithm to aggregate labels for all the instances. Then we identify instances that are most likely to be "tough" through a two-stage filtering method. Finally, an ensemble of AdaBoost-based classification models is trained on the high-quality dataset, and predicts new labels for these "tough" instances. The results of empirical investigation on binary classification task show that: (1) our approach can identify "tough" instances from the input dataset effectively; (2) our approach achieves a better performance on improving the accuracy of labels produced by unsupervised ensemble algorithms.
What problem does this paper attempt to address?