Efficient Decision Tree for Evolving Data Streams Based on Frequent Patterns

Meng HAN,Zhi-Hai WANG,Jian DING
DOI: https://doi.org/10.11897/SP.J.1016.2016.01541
2016-01-01
Abstract:Data stream may contain a large number of useless information or noises.Frequent pattern mining can drop such useless information and discover patterns.Frequent patterns may contain more information than single attribute.Therefore,frequent and discriminative pattern can be used to train classification model effectively.In this paper,we propose a two-steps method PatHT (Pattern-based Hoeffding Tree)to generate decision tree for evolving data stream classifi-cation.First step,an incremental algorithm CCFPM (Constraints-based and Closed Frequent Pattern Mining)is proposed to discover frequent pattern set CFPSet (Closed Frequent Pattern Set).These patterns are closed,that is,they have total information of complete patterns and less numbers than them.These patterns must contain class attribute for classification in next step. The sliding window model and time decay model is used in CCFPM to deal with concept drift problem.And a novel average decay factor is designed to get pattern result set with high recall and high precision.Second step,an incremental algorithm HTreeGrow (Hoeffding Tree Growing) is proposed to train concept drift decision tree based on CFPSet.Concept drift detector is used to discover concept change;therefore classification model is adjusted automatically.For high-density and low-density data streams,we design different ways to use pattern sets.The performance of proposed method is evaluated via experiments.Using real life data streams shows that the proposed method can reduce the training time or improve the classification accuracy.Processing synthetic data streams also shows that the proposed method is superior to other analogous algorithms.
What problem does this paper attempt to address?