Abstract:In many applications, data is easy to acquire but expensive and time-consuming to label prominent examples include medical imaging and NLP. This disparity has only grown in recent years as our ability to collect data improves. Under these constraints, it makes sense to select only the most informative instances from the unlabeled pool and request an oracle (e.g., a human expert) to provide labels for those samples. The goal of active learning is to infer the informativeness of unlabeled samples so as to minimize the number of requests to the oracle. Here, we formulate active learning as an open-set recognition problem. In this paradigm, only some of the inputs belong to known classes; the classifier must identify the rest as unknown. More specifically, we leverage variational neural networks (VNNs), which produce high-confidence (i.e., low-entropy) predictions only for inputs that closely resemble the training data. We use the inverse of this confidence measure to select the samples that the oracle should label. Intuitively, unlabeled samples that the VNN is uncertain about are more informative for future training. We carried out an extensive evaluation of our novel, probabilistic formulation of active learning, achieving state-of-the-art results on MNIST, CIFAR-10, and CIFAR-100. Additionally, unlike current active learning methods, our algorithm can learn tasks without the need for task labels. As our experiments show, when the unlabeled pool consists of a mixture of samples from multiple datasets, our approach can automatically distinguish between samples from seen vs. unseen tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively select the most worthy - to - be - labeled samples in the unlabeled data pool through Active Learning (AL) to minimize the number of requests to the oracle when data labeling is expensive and time - consuming. Specifically, the paper proposes a new method to treat the active learning problem as an Open - Set Recognition (OSR) problem. In OSR, only some of the inputs belong to known classes, and the classifier must identify the remaining inputs as unknown classes. This method is especially suitable for when the unlabeled data pool contains mixed samples from multiple datasets and can automatically distinguish samples of seen tasks from those of unseen tasks. The main contributions of the paper are as follows: 1. **Proposing a new active learning framework**: Using Variational Neural Networks (VNNs) as classifiers, this network only produces high - confidence (low - entropy) predictions for inputs highly similar to the training data. By using the inverse of this confidence to select the next unlabeled sample to be queried, that is, selecting the sample that the classifier is most uncertain about for labeling. 2. **Improving model performance**: On multiple image classification datasets such as MNIST, CIFAR - 10 and CIFAR - 100, this method has achieved state - of - the - art results. 3. **Enhancing robustness**: Experiments show that this method exhibits good robustness in cases of skewed initial labeled pools, different budget sizes, noisy labels, and the presence of samples from unseen distributions. In conclusion, the paper aims to provide a more efficient and robust sample selection mechanism by combining active learning with open - set recognition, thereby reducing the need for a large amount of labeled data, especially in fields such as medical imaging and natural language processing where data acquisition is relatively easy but labeling costs are high.

Deep Active Learning via Open Set Recognition

Conditional Gaussian Distribution Learning for Open Set Recognition

Deep Active Learning in the Open World

DIRECT: Deep Active Learning under Imbalance and Label Noise

Open Long-Tailed Recognition In A Dynamic World

Know Yourself Better: Diverse Discriminative Feature Learning Improves Open Set Recognition

Bidirectional Uncertainty-Based Active Learning for Open Set Annotation

Frugal Reinforcement-based Active Learning

Inconsistency-Based Data-Centric Active Open-Set Annotation

Class-aware Variational Auto-encoder for Open Set Recognition

Nearest Neighbor Classifier Embedded Network for Active Learning

Unlabeled data selection for active learning in image classification

Learning from Open-Set Noisy Labels Based on Multi-Prototype Modeling

Distribution Networks for Open Set Learning

Semi-Supervised Variational Adversarial Active Learning via Learning to Rank and Agreement-Based Pseudo Labeling

P-ODN: Prototype based Open Deep Network for Open Set Recognition

Active Learning for Vision-Language Models

OpenIncrement: A Unified Framework for Open Set Recognition and Deep Class-Incremental Learning

Deep Active Learning with Noise Stability

Contrastive Open-set Active Learning based Sample Selection for Image Classification

GALAXY: Graph-based Active Learning at the Extreme