Abstract:The performance of active learning algorithms can be improved in two ways. The often used and intuitive way is by reducing the overall error rate within the test set. The second way is to ensure that correct predictions are not forgotten when the training set is increased in between rounds. The former is measured by the accuracy of the model and the latter is captured in negative flips between rounds. Negative flips are samples that are correctly predicted when trained with the previous/smaller dataset and incorrectly predicted after additional samples are labeled. In this paper, we discuss improving the performance of active learning algorithms both in terms of prediction accuracy and negative flips. The first observation we make in this paper is that negative flips and overall error rates are decoupled and reducing one does not necessarily imply that the other is reduced. Our observation is important as current active learning algorithms do not consider negative flips directly and implicitly assume the opposite. The second observation is that performing targeted active learning on subsets of the unlabeled pool has a significant impact on the behavior of the active learning algorithm and influences both negative flips and prediction accuracy. We then develop ROSE - a plug-in algorithm that utilizes a small labeled validation set to restrict arbitrary active learning acquisition functions to negative flips within the unlabeled pool. We show that integrating a validation set results in a significant performance boost in terms of accuracy, negative flip rate reduction, or both.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two key problems in active learning: 1. **Reduce the overall error rate**: By improving the accuracy of the model on the test set, which is a common optimization goal in active learning. 2. **Prevent correct predictions from being forgotten**: Ensure that during the process of increasing the training set, samples that were previously correctly predicted will not be mis - predicted. This phenomenon is called "Negative Flips", that is, samples that were originally correctly predicted are mis - predicted on the new training set. Specifically, the author points out that current active learning algorithms usually only focus on reducing the overall error rate and ignore the impact of negative flips. However, reducing the overall error rate does not necessarily mean that negative flips will be reduced, and may even increase. Therefore, this paper proposes a new method to optimize accuracy and reduce negative flips simultaneously. ### Main contributions 1. **Comprehensive empirical analysis**: The author conducts a detailed empirical analysis of the regression phenomenon in active learning, revealing the decoupling relationship between negative flips and the overall error rate. 2. **Develop the RoSE algorithm**: Proposes a plug - in algorithm named RoSE (Regression - ordered Subset Estimation), which uses a small - scale labeled validation set to limit any active learning acquisition function, thereby specifically reducing negative flips. 3. **Extensive experimental evaluation**: Through extensive experiments on three datasets, seven acquisition functions, and two architectures, it is proved that RoSE is effective in improving accuracy and reducing negative flips. ### Method overview The core idea of the RoSE algorithm is to estimate the negative flip subset in the unlabeled pool and apply the acquisition function on this subset, thereby achieving more effective sample selection. The specific steps are as follows: 1. **Search space reduction**: Divide the unlabeled pool into the union of positive flips and negative flips (SPN) and other categories (SCW). 2. **Mis - prediction detection**: Use a small - scale labeled validation set to detect negative flip samples in SPN. In this way, RoSE can significantly reduce negative flips while maintaining or improving the model accuracy, thereby improving the overall performance of active learning. ### Formula representation The negative flip rate (NFR) is defined as: \[ \text{NFR} = \frac{1}{N} \sum_{i = 1}^{N} 1(\tilde{y}_{\text{old}}^i = y_i, \tilde{y}_{\text{new}}^i \neq y_i) \] where \( 1(\tilde{y}_{\text{old}}^i = y_i, \tilde{y}_{\text{new}}^i \neq y_i) \) is a binary variable, which is 1 when a negative flip occurs, and 0 otherwise. The objective of the RoSE algorithm can be formalized as: \[ X^* = \arg\max_{x_1,..., x_b \in q_\phi(D_{\text{pool}})} a(x_1,..., x_b | h_{\text{new}}(x)) \] where \( q_\phi(D_{\text{pool}}) \) is a subset estimation function for selecting negative flip samples from the unlabeled pool. Through these improvements, RoSE can provide better performance and stability in complex active learning tasks.

Targeting Negative Flips in Active Learning using Validation Sets

Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model

Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates

LPLgrad: Optimizing Active Learning Through Gradient Norm Sample Selection and Auxiliary Model Training

Optimizing Active Learning for Low Annotation Budgets

Personalized Negative Reservoir for Incremental Learning in Recommender Systems

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

Relabeling Minimal Training Subset to Flip a Prediction

Enhancing Vision-Language Few-Shot Adaptation with Negative Learning

Revisiting Active Learning in the Era of Vision Foundation Models

Adversarial Unlearning: Reducing Confidence Along Adversarial Directions

Deep Individual Active Learning: Safeguarding Against Out-of-Distribution Challenges in Neural Networks

Active Negative Loss: A Robust Framework for Learning with Noisy Labels

Practical Obstacles to Deploying Active Learning

Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning

Fair Active Learning in Low-Data Regimes

Querying Easily Flip-flopped Samples for Deep Active Learning

Overcoming Overconfidence for Active Learning

LFighter: Defending against the label-flipping attack in federated learning

Enhancing Security in Federated Learning through Adaptive Consensus-Based Model Update Validation

Adversarial Active Learning for Deep Networks: a Margin Based Approach