Three-layer Concept Drifting Detection in Text Data Streams

Yuhong Zhang,Guang Chu,Peipei Li,Xuegang Hu,Xindong Wu
DOI: https://doi.org/10.1016/j.neucom.2017.04.047
IF: 6
2017-01-01
Neurocomputing
Abstract:Text data streams have widely appeared in real-world applications, in which, concept drifts owe a significant challenge for classification. Compared with relational data streams, concept drifts hidden in text streams usually reflect in the relationship between the feature vector and the instance labels. Meanwhile, existing concept drifting detection methods are mainly based on error rates of classification. When applying these methods in text streams, they perform poorly in the evaluations of false alarms and missing detections, etc. Motivated by this, we firstly give a systematic analysis of the concept drifts in text data streams. Then, we propose a three-layer concept drifting detection approach, where the three layers indicate the layer of label space, the layer of feature space and the layer of the mapping relationships between labels and features, respectively. In this approach, the latter two layers are based on the values of WoE (Weight of Evidence) and the IV (Information Value) index. Experimental results show that our approach can improve the performance of concept drifting detection and the accuracy of classification, especially when concept drifts in text data streams are frequent.
What problem does this paper attempt to address?