A New Streaming Data Cluster Algorithm Based on Sliding Window

CAI Ni-ming,WANG Han-hu,CHEN Mei
DOI: https://doi.org/10.3969/j.issn.1673-629X.2011.01.007
2011-01-01
Abstract:Data stream in the most recent distribution of the more often a cause for concern.CluStream algorithm is a traditional landmark-based model of the clustering algorithm which does not eliminate expired tuples.We cannot accurately reflect the current data distribution of the data stream.Sliding window is an approximate method which is concerned about the recent data in the data stream.In order to improve the quality and efficiency of the analysis of data stream clustering,have proposed an improved algorithm on the base of CluStream algorithm in this paper.Sliding window is used to support the data processing.In order to reduce the number of the calculation in the clustering operation,the algorithm use improved k-means clustering to perform the operation.The optimized algorithm can eliminate the expired tuples in time,while the new arrived tuples can be processed in real time.Through this way,can obtain a more accurate analysis result.Compared with clustering algorithm CluStream,optimization algorithm can obtain less memory overhead and faster data-processing capacity.So that,the outcome of clustering analysis can become much more reasonable and clear.
What problem does this paper attempt to address?