Discovering an Evolutionary Classifier over a High-speed Nonstatic Stream

Jiong Yang,Xifeng Yan,Jiawei Han,Wei Wang
DOI: https://doi.org/10.1007/1-84628-284-5_13
2005-01-01
Abstract:With the emergence of large-volume and high-speed streaming data, mining data streams has become a focus of increasing interest. The major new challenges in streaming data mining are as follows: since streams may flow in and out indefinitely and at fast speed, it is usually expected that a stream-mining process can only scan a data stream once; and since the characteristics of the data may evolve over time, it is desirable to incorporate the evolving features of data streams. This paper investigates the issues of developing a high-speed classification method for streaming data with concept drifts. Among several popular classification techniques, the naïve Bayesian classifier is chosen due to its low construction cost, ease of incremental maintenance, and high accuracy. An efficient algorithm, called EvoClass (Evolutionary Classifier), is devised. EvoClass builds an incremental, evolutionary Bayesian classifier on streaming data. A train-and-test method is employed to discover the changes in the characteristics of the data and the need for construction of a new classifier. In addition, divergence is utilized to quantify the changes in the classifier and inform the user what aspects of the data characteristics have evolved. Finally, an intensive empirical study has been performed that demonstrates the effectiveness and efficiency of the EvoClass method.
What problem does this paper attempt to address?