Abstract:In the training procedures of many real-world learning models, gathering and annotating decent amounts of labeled data can be cost-prohibitive. To mitigate this data-hungry problem, active learning (AL) and semi-supervised learning (SSL) are frequently adopted as two effective but often isolated means. Some recent studies explored the potential of combining AL and SSL to better probe the unlabeled data. However, almost all these contemporary SSL-AL works use a simple combination strategy, ignoring SSL and AL's inherent relation. Further, other methods suffer from high computational costs when dealing with large-scale, high-dimensional datasets. Motivated by the industry practice of labeling data, we firstly propose an innovative I nconsistency-based virtual a D v E rsarial A ctive L earning (IDEAL) algorithm to further investigate SSL-AL's potential superiority and achieve mutual enhancement of AL and SSL, i.e., SSL propagates label information to unlabeled samples and provides smoothed embeddings for AL, while AL excludes samples with inconsistent predictions and considerable uncertainty for SSL. We estimate unlabeled samples' inconsistency by augmentation strategies of different granularities, including fine-grained continuous perturbation exploration and coarse-grained data transformations. Moreover, to solve the problem that the utilization efficiency of unlabeled samples is still insufficient in the process of semi-supervised training, we extend our IDEAL to a curriculum-guided version, namely SPL-IDEAL algorithm. The SPL-IDEAL algorithm can regularize the training process towards better regions in parameter space and denoise the pseudo labels with low confidence, achieving better performance. The extensive experiments, in both text and image benchmark datasets, validate the effectiveness of our proposed IDEAL and SPL-IDEAL algorithms, comparing them against state-of-the-art baselines. Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithms.

Semi-supervised batch active learning based on mutual information

Dual-Classifier Collaborative Method Based on Semi-Supervised Active Learning

Semisupervised SVM Batch Mode Active Learning with Applications to Image Retrieval

Semi-supervised SVM Batch Mode Active Learning for Image Retrieval

Uncertainty-Based Active Learning Via Sparse Modeling for Image Classification

Uncertainty Sampling Based Active Learning with Diversity Constraint by Sparse Selection.

Batch Mode Active Learning and Its Application to Medical Image Classification

Exploring Representativeness and Informativeness for Active Learning.

Learning Distinctive Margin Toward Active Domain Adaptation

Combining Clustering Coefficient-Based Active Learning and Semi-Supervised Learning on Networked Data

Learning adaptive criteria weights for active semi-supervised learning

Distributed Active Learning.

Visualization-Based Active Learning for Video Annotation.

Active learning with adaptive regularization

Employing Feature Mixture for Active Learning of Object Detection

Unsupervised Active Learning Based on Hierarchical Graph-Theoretic Clustering

Unleash the Power of Inconsistency-Based Semi-Supervised Active Learning by Dynamic Programming of Curriculum Learning

Improved Active Deep Learning for Semi-Supervised Classification of Hyperspectral Image

TBAL: Two-stage batch-mode active learning for image classification

Semi-Supervised Active Learning for Support Vector Machines: A Novel Approach that Exploits Structure Information in Data

Nerve growth factor eye drop administrated on the ocular surface of rodents affects the nucleus basalis and septum: Biochemical and structural evidence