Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees

Peipei Li,Xuegang Hu,Xindong Wu
DOI: https://doi.org/10.1007/978-3-540-88192-6_78
2008-01-01
Abstract:Classification with concept-drifting data streams has found wide applications. However, many classification algorithms on streaming data have been designed for fixed features of concept drift and cannot deal with the noise impact on concept drift detection. An incremental algorithm with Multiple Semi- Random decision Trees (MSRT) for concept-drifting data streams is presented in this paper, which takes two sliding windows for training and testing, uses the inequality of Hoeffding Bounds to determine the thresholds for distinguishing the true drift from noise, and chooses the classification function to estimate the error rate for periodic concept-drift detection. Our extensive empirical study shows that MSRT has an improved performance in time, accuracy and robustness in comparison with CVFDT, a state-of-the-art decision-tree algorithm for classifying concept-drifting data streams.
What problem does this paper attempt to address?