Data Stream Clustering and Modeling Using Context-Trees

Wei Jiang,Pierre Brice
DOI: https://doi.org/10.1109/icsssm.2009.5175016
2009-01-01
Abstract:Many applications such as telecommunication and commercial video broadcasting streams, computer systems logs, and web clicks are categorical or mixed-value data streams that exhibit context-dependency. Models that try to capture this context-dependency tend not to be scalable. This paper offers a solution to the scalability problem of these models by providing a method for generating them around relevant aggregates of these data streams rather than the individual samples. The approach expands existing clustering techniques for static categorical data sets to predictive models of data streams based on Variable Length Markov models of clusters. The paper includes theoretical and experimental evaluations of the technique as well as comparison with other prominent clustering techniques for categorical data streams.
What problem does this paper attempt to address?