Re-Benchmarking Pool-Based Active Learning for Binary Classification

Po-Yi Lu,Chun-Liang Li,Hsuan-Tien Lin
2023-09-24
Abstract:Active learning is a paradigm that significantly enhances the performance of machine learning models when acquiring labeled data is expensive. While several benchmarks exist for evaluating active learning strategies, their findings exhibit some misalignment. This discrepancy motivates us to develop a transparent and reproducible benchmark for the community. Our efforts result in an open-sourced implementation (<a class="link-external link-https" href="https://github.com/ariapoy/active-learning-benchmark" rel="external noopener nofollow">this https URL</a>) that is reliable and extensible for future research. By conducting thorough re-benchmarking experiments, we have not only rectified misconfigurations in existing benchmark but also shed light on the under-explored issue of model compatibility, which directly causes the observed discrepancy. Resolving the discrepancy reassures that the uncertainty sampling strategy of active learning remains an effective and preferred choice for most datasets. Our experience highlights the importance of dedicating research efforts towards re-benchmarking existing benchmarks to produce more credible results and gain deeper insights.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in Pool - Based Active Learning, the existing benchmarks have the problems of inconsistency and lack of transparency. Specifically: 1. **Inconsistency of existing benchmarks**: Pool - Based Active Learning benchmarks developed by different research teams have reached contradictory conclusions. For example, Yang and Loog [54] believe that Uncertainty Sampling (US) performs best on most datasets, while Zhan et al. [60] believe that Learning Active Learning (LAL) is superior to Uncertainty Sampling in binary classification tasks. This inconsistency may cause practitioners to be confused when choosing an appropriate active learning strategy. 2. **Lack of transparency and reproducibility**: Existing benchmarks lack transparency and do not disclose the source code publicly, which makes it difficult for other researchers to reproduce the results. Moreover, even though Zhan et al. [60] later disclosed part of the source code, this code lacks sufficient details to fully reproduce their experimental results. To solve these problems, the authors developed a transparent and reproducible benchmarking framework. Through re - benchmarking experiments, they corrected the configuration errors in the existing benchmarks and revealed the impact of model compatibility issues on Uncertainty Sampling. Finally, they confirmed that Uncertainty Sampling is still an effective choice for most datasets and emphasized the importance of re - benchmarking. ### Main contributions: 1. **Develop a transparent and reproducible benchmarking framework**: The authors provide an open - source implementation (https://github.com/ariapoy/active - learning - benchmark), enabling the community to conduct future research. 2. **Discover model compatibility issues**: The authors revealed that the incompatibility between the query model and the task model will lead to a decline in the performance of Uncertainty Sampling, thus explaining the reasons for the inconsistent conclusions in the early benchmarks. 3. **Evaluate multiple active learning strategies**: The authors found that more than half of the active learning strategies are not significantly better than the Uniform Sampling baseline in binary classification tasks, which prompts the community to re - examine the gap between algorithm design and practical applications. ### Experimental setup: - **Dataset splitting**: 40% of the data is used for the test set, and 20 randomly selected samples from the remaining data are used as the initial labeled pool. - **Model selection**: SVM(RBF) with default hyperparameters is used as the task model. - **Budget setting**: The budget is equal to the size of the initial unlabeled pool. - **Query strategy**: 17 different query strategies are compared, including US - C, US - NC, QBC, VR, EER, Core - Set, Graph, Hier, HintSVM, QUIRE, DWUS, InfoDiv, MCM, BMDR, SPAL, ALBL and LAL. ### Experimental results: - **Performance of Uncertainty Sampling (US - C)**: On most datasets, the average AUBC of US - C is higher than that of other query strategies. - **Impact of model compatibility**: Non - compatible query models and task models will significantly reduce the performance of Uncertainty Sampling. - **Strategy ranking**: US - C usually ranks first, followed by MCM and QBC. ### Conclusion: Through re - benchmarking, the authors not only corrected the errors in the existing benchmarks but also revealed the important impact of model compatibility on the performance of Uncertainty Sampling. These findings help practitioners choose and apply active learning strategies more accurately.