Abstract:Active learning reduces the annotation cost of machine learning by selecting and querying informative unlabeled samples. Semi-supervised active learning methods can considerably utilize the regional information of unlabeled samples, and thus, more effectively select valuable samples. Existing semi-supervised batch active learning algorithms frequently exhibit poor robustness due to their high computational complexity, making handling large-scale datasets a difficult task. However, existing active learning algorithms based on high-performance semi-supervised learners adopt a single-sample selection mode, under which the model requires multiple rounds of iterative processes, significantly reducing the overall efficiency of the algorithm and affecting its practicality. To address these issues, we propose a new semi-supervised batch active learning algorithm called approximate error reduction based on mutual information (MIAER). First, we use hierarchical anchor graph regularization (HAGR) as the semi-supervised learner. HAGR exhibits good robustness and only involves a small-scale reduced Laplacian matrix in its optimization process, enabling rapid processing of large-scale datasets. Second, we propose a batch sampling strategy based on mutual information and error reduction in the sample selection stage. This strategy, which is based on hierarchical anchor graphs, first measures the uncertainty of samples by using approximate error reduction, considerably reducing computational overhead. Then, it uses mutual information to measure the diversity of samples in category space while removing redundant batch samples, preserving samples with high uncertainty as much as possible. Comparative experiments with several advanced active learning methods on a large number of datasets fully demonstrate the effectiveness and stability of MIAER.

Scalable Active Learning by Approximated Error Reduction

Uncertainty-aware Complementary Label Queries for Active Learning

Uncertainty-Based Active Learning Via Sparse Modeling for Image Classification

Uncertainty Sampling Based Active Learning with Diversity Constraint by Sparse Selection.

An Expected Integrated Error Reduction Function for Accelerating Bayesian Active Learning of Failure Probability

Semi-supervised batch active learning based on mutual information

Maximum Mean Discrepancy Adversarial Active Learning.

Distributed Active Learning.

Active learning with adaptive regularization

A Scalable Algorithm for Graph-Based Active Learning

Heuristic improvement for active learning using localized generalization error as selection criterion

AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets

Scalable Semi-Supervised Learning by Efficient Anchor Graph Regularization

A Scalable Algorithm for Active Learning

Active Learning with Adaptive Heterogeneous Ensembles

Non-myopic Active Learning with Performance Guarantee

Active learning using localized generalization error of candidate sample as criterion

Recursive Maximum Margin Active Learning

Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data

Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

Hierarchical exploration based active learning with support vector machine