Semi-supervised batch active learning based on mutual information

Xia Ji,LingZhu Wang,XiaoHao Fang
DOI: https://doi.org/10.1007/s10489-024-05962-5
IF: 5.3
2024-12-12
Applied Intelligence
Abstract:Active learning reduces the annotation cost of machine learning by selecting and querying informative unlabeled samples. Semi-supervised active learning methods can considerably utilize the regional information of unlabeled samples, and thus, more effectively select valuable samples. Existing semi-supervised batch active learning algorithms frequently exhibit poor robustness due to their high computational complexity, making handling large-scale datasets a difficult task. However, existing active learning algorithms based on high-performance semi-supervised learners adopt a single-sample selection mode, under which the model requires multiple rounds of iterative processes, significantly reducing the overall efficiency of the algorithm and affecting its practicality. To address these issues, we propose a new semi-supervised batch active learning algorithm called approximate error reduction based on mutual information (MIAER). First, we use hierarchical anchor graph regularization (HAGR) as the semi-supervised learner. HAGR exhibits good robustness and only involves a small-scale reduced Laplacian matrix in its optimization process, enabling rapid processing of large-scale datasets. Second, we propose a batch sampling strategy based on mutual information and error reduction in the sample selection stage. This strategy, which is based on hierarchical anchor graphs, first measures the uncertainty of samples by using approximate error reduction, considerably reducing computational overhead. Then, it uses mutual information to measure the diversity of samples in category space while removing redundant batch samples, preserving samples with high uncertainty as much as possible. Comparative experiments with several advanced active learning methods on a large number of datasets fully demonstrate the effectiveness and stability of MIAER.
computer science, artificial intelligence
What problem does this paper attempt to address?