Data Stream Classification with Data Uncertainty and Concept Drift

Yan-xia L(U),Cui-rong WANG,Cong WANG,Ying YUAN
DOI: https://doi.org/10.3969/j.issn.0255-8297.2017.05.003
2017-01-01
Abstract:Data in the Web have much uncertainty because of privacy protection,data loss,network errors,etc.In a data stream system,data arrive continuously and therefore one cannot obtain all data in any time.In addition,the concept drift often occurs in the data stream.This paper constructs an incremental classification model to deal with data stream classification with data uncertainty and concept drift.In this model,a fast decision tree algorithm is used.It can analyze uncertain information quickly and effectively both in the learning stage and the classification stage.In the learning stage,it uses the Hoeffding bound theory to quickly construct a decision tree model for the data stream with data uncertainty.In the classification stage,it uses a weighted Bayes classifier in the tree leaves to improve precision of the classification.The use of a sliding window to replace the treeensures that the algorithm can deal with concept drift.Experimental results show that the algorithm has good classification accuracy and execution efficiency both on artificial and real data.
What problem does this paper attempt to address?