A Framework for Projected Clustering of High Dimensional Data Streams

Charu C. Aggarwal,Jiawei Han,Jianyong Wang,Philip S. Yu
DOI: https://doi.org/10.1016/B978-012088469-8.50075-9
2004-01-01
Abstract:The data stream problem has been studied ex- tensively in recent years, because of the great ease in collection of stream data. The na- ture of stream data makes it essential to use algorithms which require only one pass over the data. Recently, single-scan, stream anal- ysis methods have been proposed in this con- text. However, a lot of stream data is high- dimensional in nature. High-dimensional data is inherently more complex in clustering, clas- sication, and similarity search. Recent re- search discusses methods for projected clus- tering over high-dimensional data sets. This method is however dicult to generalize to data streams because of the complexity of the method and the large volume of the data streams. In this paper, we propose a new, high- dimensional, projected data stream clustering method, called HPStream. The method incor- porates a fading cluster structure, and the pro- jection based clustering methodology. It is in- crementally updatable and is highly scalable on both the number of dimensions and the size of the data streams, and it achieves bet- ter clustering quality in comparison with the previous stream clustering methods. Our per- formance study with both real and synthetic data sets demonstrates the eciency and ef- fectiveness of our proposed framework and im- plementation methods.
What problem does this paper attempt to address?