A Density-Based Clustering Structure Mining Algorithm for Data Streams

YU Yan-Wei,WANG Huan,WANG Qin,ZHAO Jin-Dong
DOI: https://doi.org/10.1145/2351316.2351326
2015-01-01
Abstract:Today, advances in hardware and storage techniques demand for automatically data mining on data streams. Clustering analysis is an importance tool on data streams mining. Though density-based clustering algorithms on data streams now could discover clusters of arbitrary shapes, their effectiveness are depended on parameters settings. Also global parameters used in these algorithms limit their ability in discovering overlapping clusters. In this paper, we propose a novel density-based clustering structure mining algorithm for data streams---OPCluStream. It could adaptively discover clusters of arbitrary shapes and overlapping clusters. Satisfying one-pass constraint, OPCluStream uses a tree topology to index points on which points link to other related ones using pointers directionally. This tree topology records relationships among points, which represent clustering results including a broad range of Eps settings and could discover clusters through a transformation to clustering structure. Clustering structure is equivalent to the index structure and convenient to be used. In addition, OPCluStream has a high efficiency on clustering since a usage of tree topology in points' index and a designed limited computing area when new points added to data streams. A number of experiments on synthetic and real data sets illustrate the effectiveness, efficiency and insights provided by our method.
What problem does this paper attempt to address?