ALiPy: Active Learning in Python

Ying-Peng Tang,Guo-Xiang Li,Sheng-Jun Huang
DOI: https://doi.org/10.48550/arXiv.1901.03802
2019-01-12
Abstract:Supervised machine learning methods usually require a large set of labeled examples for model training. However, in many real applications, there are plentiful unlabeled data but limited labeled data; and the acquisition of labels is costly. Active learning (AL) reduces the labeling cost by iteratively selecting the most valuable data to query their labels from the annotator. This article introduces a Python toobox ALiPy for active learning. ALiPy provides a module based implementation of active learning framework, which allows users to conveniently evaluate, compare and analyze the performance of active learning methods. In the toolbox, multiple options are available for each component of the learning framework, including data process, active selection, label query, results visualization, etc. In addition to the implementations of more than 20 state-of-the-art active learning algorithms, ALiPy also supports users to easily configure and implement their own approaches under different active learning settings, such as AL for multi-label data, AL with noisy annotators, AL with different costs and so on. The toolbox is well-documented and open-source on Github, and can be easily installed through PyPI.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in machine learning, especially in supervised learning methods, a large amount of labeled data is usually required to train models. However, in many practical applications, although there is a large amount of unlabeled data, the labeled data is very limited and the cost of obtaining these labels is very high. Active Learning (AL) reduces the labeling cost by iteratively selecting the most valuable data for labeling. This paper introduces a Python toolbox named ALiPy, which aims to provide a modular implementation of the active learning framework, enabling users to easily evaluate, compare and analyze the performance of different active learning methods. In addition, ALiPy also supports users to easily configure and implement their own methods under different active learning settings, such as active learning for multi - label data, active learning with noisy labelers, active learning with different costs, etc. This not only improves the flexibility of research and application, but also provides convenience for exploring new active learning strategies.