Abstract:In the training procedures of many real-world learning models, gathering and annotating decent amounts of labeled data can be cost-prohibitive. To mitigate this data-hungry problem, active learning (AL) and semi-supervised learning (SSL) are frequently adopted as two effective but often isolated means. Some recent studies explored the potential of combining AL and SSL to better probe the unlabeled data. However, almost all these contemporary SSL-AL works use a simple combination strategy, ignoring SSL and AL's inherent relation. Further, other methods suffer from high computational costs when dealing with large-scale, high-dimensional datasets. Motivated by the industry practice of labeling data, we firstly propose an innovative I nconsistency-based virtual a D v E rsarial A ctive L earning (IDEAL) algorithm to further investigate SSL-AL's potential superiority and achieve mutual enhancement of AL and SSL, i.e., SSL propagates label information to unlabeled samples and provides smoothed embeddings for AL, while AL excludes samples with inconsistent predictions and considerable uncertainty for SSL. We estimate unlabeled samples' inconsistency by augmentation strategies of different granularities, including fine-grained continuous perturbation exploration and coarse-grained data transformations. Moreover, to solve the problem that the utilization efficiency of unlabeled samples is still insufficient in the process of semi-supervised training, we extend our IDEAL to a curriculum-guided version, namely SPL-IDEAL algorithm. The SPL-IDEAL algorithm can regularize the training process towards better regions in parameter space and denoise the pseudo labels with low confidence, achieving better performance. The extensive experiments, in both text and image benchmark datasets, validate the effectiveness of our proposed IDEAL and SPL-IDEAL algorithms, comparing them against state-of-the-art baselines. Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithms.

Temporal Inconsistency-Based Active Learning.

Unleash the Power of Inconsistency-Based Semi-Supervised Active Learning by Dynamic Programming of Curriculum Learning

Optimizing Active Learning for Low Annotation Budgets

Tracing Training Progress: Dynamic Influence Based Selection for Active Learning

ATAL: Active Learning Using Adversarial Training for Data Augmentation

Active learning with effective scoring functions for semi-supervised temporal action localization

New Balanced Active Learning Model and Optimization Algorithm.

ALVIN: Active Learning Via INterpolation

An Active Learning Framework with a Class Balancing Strategy for Time Series Classification

Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning

Taming Small-sample Bias in Low-budget Active Learning

Dual Adversarial Network for Deep Active Learning

Revisiting Active Learning in the Era of Vision Foundation Models

Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning

FastAL: Fast Evaluation Module for Efficient Dynamic Deep Active Learning using Broad Learning System

Practical Obstacles to Deploying Active Learning

Self-supervised Class-Balanced Active Learning with Uncertainty-Mastery Fusion

Adaptive Learning for Dynamic Features and Noisy Labels

Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning

Contrastive Representation Based Active Learning for Time Series

Inconsistency-Based Data-Centric Active Open-Set Annotation