NC-ALG: Graph-Based Active Learning under Noisy Crowd

Wentao Zhang,Yexin Wang,Zhenbang You,Yang Li,Gang Cao,Zhi Yang,Bin Cui
DOI: https://doi.org/10.1109/icde60146.2024.00210
2024-01-01
Abstract:Graph Neural Networks (GNNs) have achieved great success in various data mining tasks but they heavily rely on a large number of annotated nodes, requiring considerable human efforts. Despite the effectiveness of existing GNN-based Active Learning (AL) methods, they assume that the annotated labels are always correct, which is contradictory to the error-prone labeling process in a practical crowdsourcing environment. Besides, due to this impractical assumption, existing works only focus on optimizing the node selection in AL but neglect optimizing the labeling process. Therefore, we present NC-ALG, the first GNN-based AL framework that optimizes both the node selection and node labeling process under a noisy crowd. For node selection, NC-ALG introduces a new measurement to model influence reliability and an effective influence maximization objective to select nodes. For node labeling, NC-ALG significantly reduces the labeling cost by considering the model-predicted labels and the labels of mirror nodes. To the best of our knowledge, this is the first attempt to consider GNN-based AL under the practical noisy crowd. Empirical studies on public datasets demonstrate that NC-ALG significantly outperforms existing methods in terms labeling efficiency. Notably, it only takes NC-ALG one-third of the labeling budget that the competitive baseline GRAIN needs to achieve an accuracy of 70.7 % on PubMed.
What problem does this paper attempt to address?