A Cross-Domain Benchmark for Active Learning

Thorben Werner,Johannes Burchert,Maximilian Stubbemann,Lars Schmidt-Thieme

2024-08-01

Abstract:Active Learning (AL) deals with identifying the most informative samples for labeling to reduce data annotation costs for supervised learning tasks. AL research suffers from the fact that lifts from literature generalize poorly and that only a small number of repetitions of experiments are conducted. To overcome these obstacles, we propose \emph{CDALBench}, the first active learning benchmark which includes tasks in computer vision, natural language processing and tabular learning. Furthermore, by providing an efficient, greedy oracle, \emph{CDALBench} can be evaluated with 50 runs for each experiment. We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research. Concretely, we show that the superiority of specific methods varies over the different domains, making it important to evaluate Active Learning with a cross-domain benchmark. Additionally, we show that having a large amount of runs is crucial. With only conducting three runs as often done in the literature, the superiority of specific methods can strongly vary with the specific runs. This effect is so strong, that, depending on the seed, even a well-established method's performance can be significantly better and significantly worse than random for the same dataset.

Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the issue that existing methods in Active Learning (AL) research have poor generalization capabilities across different domains, and the reliability of results is low due to insufficient experimental repetitions. Specifically: 1. **Domain Generalization Issue**: Existing active learning methods are often evaluated only in a specific domain (such as computer vision or natural language processing), making it difficult to generalize their performance to other domains. Therefore, a cross-domain benchmark is needed to evaluate the performance of these methods in different application areas. 2. **Insufficient Experimental Repetitions**: Many studies, due to computational resource limitations, typically conduct only a small number of experimental repetitions (e.g., 3 times), leading to high variance in results and making it difficult to draw meaningful conclusions. For example, some methods may perform worse than random selection under certain random seeds, while significantly outperforming random selection in other cases. To address these issues, the authors propose **CDALBench**, a benchmark framework for active learning that includes multiple domains such as computer vision, natural language processing, and tabular data. By providing a large number of experimental repetitions (50 times per experiment), CDALBench can more accurately evaluate the performance of different active learning methods and reveal performance differences across different domains. Additionally, the authors propose an efficient greedy algorithm to approximate the optimal solution (oracle), further improving the accuracy of the evaluation.

A Cross-Domain Benchmark for Active Learning

Towards Comparable Active Learning

ALdataset: a benchmark for pool-based active learning

Benchmarking Multi-Domain Active Learning on Image Classification

ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data

Composite Active Learning: Towards Multi-Domain Active Learning with Theoretical Guarantees

Active Learning Over Multiple Domains in Natural Language Tasks

ALBench: A Framework for Evaluating Active Learning in Object Detection

Practical Obstacles to Deploying Active Learning

Learning Distinctive Margin Toward Active Domain Adaptation

ActiveGLAE: A Benchmark for Deep Active Learning with Transformers

ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

Domain Adversarial Active Learning for Domain Generalization Classification

Re-Benchmarking Pool-Based Active Learning for Binary Classification

A Survey of Deep Active Learning

Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

Perturbation-Based Two-Stage Multi-Domain Active Learning

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Optimizing Active Learning for Low Annotation Budgets

On the Limitations of Simulating Active Learning

Multi-domain active learning for text classification.