Concept Based Short Text Stream Classification with Topic Drifting Detection

Peipei Li,Lu He,Xuegang Hu,Yuhong Zhang,Lei Li,Xindong Wu
DOI: https://doi.org/10.1109/icdm.2016.0128
2016-01-01
Abstract:Short text stream classification is a challenging and significant task due to the characteristics of short length, weak signal, high velocity and especially topic drifting in short text stream. However, this challenge has received little attention from the research community. Motivated by this, we propose a new feature extension approach for short text stream classification using a large scale, general purpose semantic network obtained from a web corpus. Our approach is built on an incremental ensemble classification model. First, in terms of the open semantic network, we introduce more semantic contexts in short texts to make up of the data sparsity. Meanwhile, we disambiguate terms by their semantics to reduce the noise impact. Second, to effectively track hidden topic drifts, we propose a concept cluster based topic drifting detection method. Finally, extensive experiments demonstrate that our approach can detect topic drifts effectively compared to several well-known concept drifting detection methods in data streams. Meanwhile, our approach can perform best in the classification of text data streams compared to several state-of-the-art short text classification approaches.
What problem does this paper attempt to address?