Mining Textual Stream with Partial Labeled Instances Using Ensemble Framework

Ge Song,Yan Li,Chunshan Li,Jingjing Chen,Yunming Ye
DOI: https://doi.org/10.14257/ijdta.2014.7.4.05
2014-01-01
International Journal of Database Theory and Application
Abstract:Increasing access to large-scale, high-dimensional and non-stationary streams in many real applications has made it necessary to design new dynamic classification algorithms. Most existing approaches for the textual stream classification are able to train the model relying on labeled data. However, only a limited number of instances can be labeled in a real streaming environment since large-scale data appear at a high speed. Therefore, it is useful to make unlabeled instances available for training and updating the ensemble models. In this paper, we present a new ensemble framework with partial labeled instances for learning from the textual stream. A new semi-supervised cluster-based classifier is proposed as the subclassifier in our approach. In order to integrate these sub-classifiers, we propose an adaptive selection method. Empirical evaluation of textual streams reveals that our approach outperforms state-of-the-art stream classification algorithms.
What problem does this paper attempt to address?