Classifying noisy data streams

Yong Wang,Zhanhuai Li,Yang Zhang
DOI: https://doi.org/10.1007/11881599_65
2006-01-01
Abstract:The two main challenges associated with mining data streams are concept drifting and data noise. Current algorithms mainly depend on the robust of the base classifier or learning ensembles, and have no active mechanisms to deal noisy. However, noise still can induce the drastic drops in accuracy. In this paper, we present a clustering-based method to filter out hard instances and noise instances from data streams. We also propose a trigger to detect concept drifting and build RobustBoosting, an ensemble classifier, by boosting the hard instances. We evaluated RobustBoosting algorithm and AdaptiveBoosting algorithm [1] on the synthetic and real-life data sets. The experiment results show that the proposed method has substantial advantage over AdaptiveBoosting algorithm in prediction accuracy, and that it can converge to target concepts efficiently with high accuracy on datasets with noise level as high as 40%.
What problem does this paper attempt to address?