Abstract:Active learning reduces the annotation cost of machine learning by selecting and querying informative unlabeled samples. Semi-supervised active learning methods can considerably utilize the regional information of unlabeled samples, and thus, more effectively select valuable samples. Existing semi-supervised batch active learning algorithms frequently exhibit poor robustness due to their high computational complexity, making handling large-scale datasets a difficult task. However, existing active learning algorithms based on high-performance semi-supervised learners adopt a single-sample selection mode, under which the model requires multiple rounds of iterative processes, significantly reducing the overall efficiency of the algorithm and affecting its practicality. To address these issues, we propose a new semi-supervised batch active learning algorithm called approximate error reduction based on mutual information (MIAER). First, we use hierarchical anchor graph regularization (HAGR) as the semi-supervised learner. HAGR exhibits good robustness and only involves a small-scale reduced Laplacian matrix in its optimization process, enabling rapid processing of large-scale datasets. Second, we propose a batch sampling strategy based on mutual information and error reduction in the sample selection stage. This strategy, which is based on hierarchical anchor graphs, first measures the uncertainty of samples by using approximate error reduction, considerably reducing computational overhead. Then, it uses mutual information to measure the diversity of samples in category space while removing redundant batch samples, preserving samples with high uncertainty as much as possible. Comparative experiments with several advanced active learning methods on a large number of datasets fully demonstrate the effectiveness and stability of MIAER.

Batch-Mode Active Learning via Error Bound Minimization.

Batch Mode Active Learning and Its Application to Medical Image Classification

Cost-effective Batch-mode Multi-label Active Learning

A Multicriterion Query-Based Batch Mode Active Learning Technique

Non-myopic active learning with mutual information

Batch Active Learning with Two-Stage Sampling

Active Learning Through Label Error Statistical Methods

Non-myopic Active Learning with Performance Guarantee

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

Black-Box Batch Active Learning for Regression

Bayesian Active Learning by Soft Mean Objective Cost of Uncertainty

Convex Batch Mode Active Sampling Via Α-Relative Pearson Divergence.

Selective labeling via error bound minimization

Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data

Semi-supervised batch active learning based on mutual information

BALQUE: Batch active learning by querying unstable examples with calibrated confidence

Cost-sensitive Active Learning with a Label Uniform Distribution Model

Active Learning for Cost-Sensitive Classification Using Logistic Regression Model

Multi-class Active Learning: A Hybrid Informative and Representative Criterion Inspired Approach.

An Active Learning Method under Very Limited Initial Labeled Data

A benchmark and comparison of active learning for logistic regression