Data stream classifier with limited labelled data

Zhongyang XIONG,Xingqin ZHOU,Yufang ZHANG
DOI: https://doi.org/10.3778/j.issn.1002-8331.1304-0457
2015-01-01
Abstract:Most algorithms for data streams have addressed the problems of infinite length and concept drifting. However, These algorithms need all instances to be labelled by human experts and then they use them as training set to get a classifier. It is impractical in a high-speed data stream environment because labelling instances are both time consuming and costly. Then if just using supervised learning method to train a classifier, a small number of labelled instances will get a poor clas-sifier. This paper proposes a classification algorithm for data stream based on active learning. The method selects a small part of instances to be labelled, which have low confidence when classifying. Thus the number of instances needed to be labeled is greatly reduced. The experimental results show that the proposed method can use a small number of labelled data to classify the concept-drifting data streams correctly.
What problem does this paper attempt to address?