Improved Data Stream Clustering Algorithm Based on Temporal Density Features

Yu-zhong CHEN,Song-rong GUO,Kun GUO,Guo-hui LI,Wei-chao LIN
DOI: https://doi.org/10.3969/j.issn.1000-1220.2018.01.014
2018-01-01
Abstract:The classic CluStream algorithm limits the increase of the number of micro-clusters during online micro clustering,forcibly merges micro-clusters,which affects online clustering result,results in low data stream clustering quality and the incapability to adapt to the massive data. In this paper,an improved CluStream clustering algorithm based on temporal density features is proposed. First,the concept of micro-clusters temporal density is proposed, and used to describe micro-clusters. Second, a new micro-cluster deletion mechanism is developed to add the number of micro-clusters dynamically according to online micro-clusters. . Finally, the parallel computation framework is proposed to implement the algorithm to satisfy the demand of massive real-time data processing. Compara-tive experiments on artificial and real data sets show that the new algorithm can achieve higher clustering quality than the CluStream algorithm.
What problem does this paper attempt to address?