Learning from Distribution-Changing Data Streams Via Decision Tree Model Reuse

Peng ZHAO,Zhi-Hua ZHOU
DOI: https://doi.org/10.1360/ssi-2020-0170
2020-01-01
Scientia Sinica Informationis
Abstract:In many real-world applications, data are collected in the form of streams. As a result of the evolving nature of dynamic environments, the distribution of data streams generally changes over time. Such distribution changes hinder the application of conventional machine learning approaches because the fundamental assumption of independent and identical distribution does not hold in these scenarios. This paper proposes an algorithm based on the decision tree model reuse mechanism for learning from distribution-changing data streams. The proposed algorithm is essentially an online ensemble method that maintains a model pool and updates it by performing decision tree model reuse. The main idea is to exploit the useful knowledge in historical data to help resist the negative effects of distribution changes. We validate the effectiveness of the proposed approach through experiments on synthetic and real-world datasets.
What problem does this paper attempt to address?