Constructing Decision Trees for Mining High-Speed Data Streams

Xu Wenhua,Qin Zheng
IF: 1.019
2012-01-01
Chinese Journal of Electronics
Abstract:Very fast decision tree is one of the most successful and prominent algorithms specifically designed for stream data classification. In this paper, we develop a new decision tree induction model CFDT (Clustering feature decision tree model), which is an extension to VFDT (Very fast decision tree). CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction. Moreover, micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce any-time property. Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while also generating high classification accuracy with high speed.
What problem does this paper attempt to address?