Abstract:Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: <a class="link-external link-https" href="https://github.com/EfficientTraining/LabelBench" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper mainly addresses the following issues: ### Research Background and Objectives - **Label Cost Issue**: In modern machine learning applications, labeled data is crucial but expensive to obtain. - **Improving Label Efficiency**: Research on how to achieve high predictive performance with fewer labeled samples. ### Solution - **LabelBench Framework**: Proposes a comprehensive and computationally efficient framework for evaluating the combined effects of various label-efficient learning techniques. - **Combining Multiple Techniques**: Integrates multiple label-efficient learning methods such as transfer learning, semi-supervised learning (Semi-SL), and active learning (AL) into a unified evaluation framework. - **For Large Pre-trained Models**: Focuses particularly on the application of these techniques on large-scale pre-trained models to achieve better label efficiency. ### Main Contributions 1. **LabelBench Framework**: A new framework for jointly evaluating multiple label-efficient learning techniques, capable of effectively handling computational challenges, especially in large-scale neural network architectures. 2. **Lightweight Retraining Scheme**: Proposes a lightweight retraining scheme based on updating only the last layer of large pre-trained models, significantly reducing training costs while maintaining most of the label efficiency gains brought by active learning. 3. **Comprehensive Experimental Results**: Demonstrates through experiments the performance of combining various deep active learning algorithms with semi-supervised learning in fine-tuning large pre-trained vision transformers. Experimental results show that this approach can significantly reduce annotation costs compared to traditional methods, especially on datasets like CIFAR-10 and ImageNet. ### Experimental Highlights - On the CIFAR-10 dataset, using active learning methods can save up to 75% of annotation costs compared to random sampling. - Under a fixed annotation budget, active learning algorithms can significantly improve test accuracy by over 1.2% and increase prediction accuracy on the unlabeled training data pool by over 5%. - Compared to previous best results, the new method improves test accuracy by at least 10% under the same settings. ### Conclusion LabelBench provides a lightweight benchmarking framework that allows researchers to test their algorithms in more realistic and larger-scale scenarios.

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Learning to Label with Active Learning and Reinforcement Learning.

Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling

ALBench: A Framework for Evaluating Active Learning in Object Detection

Adaptive Model Scheduling for Resource-efficient Data Labeling

Personalized Benchmarking with the Ludwig Benchmarking Toolkit

Active Testing: Sample-Efficient Model Evaluation

Active Learning with Label Quality Control

Label Smarter, Not Harder: CleverLabel for Faster Annotation of Ambiguous Image Classification with Higher Quality

LAMM: Label Alignment for Multi-Modal Prompt Learning

Synergistic Training: Harnessing Active Learning and Pseudo-Labeling for Enhanced Model Performance in Deep Learning

Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems

Data : Labeler 1 : Labeler 2 : Labeler 3 : Figure

Cost-effective Batch-mode Multi-label Active Learning

On the Marginal Benefit of Active Learning: Does Self-Supervision Eat Its Cake?

Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions

An Empirical Study on the Efficacy of Deep Active Learning Techniques

A Simple yet Effective Framework for Active Learning to Rank

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Frugal Reinforcement-based Active Learning

Comparing Visual-Interactive Labeling with Active Learning: An Experimental Study