Distributed Active Learning.

Pengcheng Shen,Chunguang Li,Zhaoyang Zhang
DOI: https://doi.org/10.1109/access.2016.2572198
IF: 3.9
2016-01-01
IEEE Access
Abstract:Active learning aims at obtaining high-accuracy models with as a few labeled data as possible, by iteratively and elaborately selecting most valuable data to query labels during the learning process, thereby the cost of labeling data can be reduced. Most previous active learning approaches consider the situation of centralized processing, where all the unlabeled data are supposed to be gathered together in one place. Due to the development of distributed applications, distributed processing has attracted a lot of interests given the situation that data are distributed at different nodes over network. In this paper, we focus on the issue of distributed active learning (DAL) for the classification problem. We propose a fully decentralized active learning approach, which consists of two parts, namely, a distributed sample selection strategy and a distributed classification algorithm. The former helps nodes to cooperatively select data based on uncertainty, diversity, and representativeness of data. Due to the introducing of a randomized preselection method in the strategy, we can achieve diversity of the selected data without any information exchange among nodes. The latter helps each node to train its local multi-class classification model in a global sense without transmitting original data among nodes. We demonstrate the effectiveness of the proposed DAL approach on several real data sets. Simulation results show that the proposed approach can significantly reduce the number of labeled data needed for obtaining a high-accuracy classifier in distributed case.
What problem does this paper attempt to address?