Abstract:In the training procedures of many real-world learning models, gathering and annotating decent amounts of labeled data can be cost-prohibitive. To mitigate this data-hungry problem, active learning (AL) and semi-supervised learning (SSL) are frequently adopted as two effective but often isolated means. Some recent studies explored the potential of combining AL and SSL to better probe the unlabeled data. However, almost all these contemporary SSL-AL works use a simple combination strategy, ignoring SSL and AL's inherent relation. Further, other methods suffer from high computational costs when dealing with large-scale, high-dimensional datasets. Motivated by the industry practice of labeling data, we firstly propose an innovative I nconsistency-based virtual a D v E rsarial A ctive L earning (IDEAL) algorithm to further investigate SSL-AL's potential superiority and achieve mutual enhancement of AL and SSL, i.e., SSL propagates label information to unlabeled samples and provides smoothed embeddings for AL, while AL excludes samples with inconsistent predictions and considerable uncertainty for SSL. We estimate unlabeled samples' inconsistency by augmentation strategies of different granularities, including fine-grained continuous perturbation exploration and coarse-grained data transformations. Moreover, to solve the problem that the utilization efficiency of unlabeled samples is still insufficient in the process of semi-supervised training, we extend our IDEAL to a curriculum-guided version, namely SPL-IDEAL algorithm. The SPL-IDEAL algorithm can regularize the training process towards better regions in parameter space and denoise the pseudo labels with low confidence, achieving better performance. The extensive experiments, in both text and image benchmark datasets, validate the effectiveness of our proposed IDEAL and SPL-IDEAL algorithms, comparing them against state-of-the-art baselines. Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithms.

Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of Semi-Supervised Learning and Active Learning

Unleash the Power of Inconsistency-Based Semi-Supervised Active Learning by Dynamic Programming of Curriculum Learning

Dual-Classifier Collaborative Method Based on Semi-Supervised Active Learning

Semi-supervised Active Learning for Semi-supervised Models: Exploit Adversarial Examples with Graph-based Virtual Labels

Bridging the gap with grad: Integrating active learning into semi-supervised domain generalization

InterLUDE: Interactions between Labeled and Unlabeled Data to Enhance Semi-Supervised Learning

Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation

Revisiting Deep Semi-supervised Learning: An Empirical Distribution Alignment Framework and Its Generalization Bound

Federated Semi-Supervised Learning with Annotation Heterogeneity

Federated Sensing : Edge-Cloud Elastic Collaborative Learning for Intelligent Sensing

ASSL-HGAT: Active semi-supervised learning empowered heterogeneous graph attention network

Knowledge-Aware Federated Active Learning with Non-IID Data

How To Overcome Confirmation Bias in Semi-Supervised Image Classification By Active Learning

FlexSSL : A Generic and Efficient Framework for Semi-Supervised Learning

DeLaLA: Semisupervised Learning via Determinately Labeling and Kernelized Large Margin Projection

MSR: Making Self-supervised learning Robust to Aggressive Augmentations

IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models

iSSL-AL: a deep active learning framework based on self-supervised learning for image classification

Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection

Collaborative Active Learning in Conditional Trust Environment