Two-layer Clustering over Data Stream with Fault-Tolerance

由育阳,朱纪洪,杨志宏
DOI: https://doi.org/10.13700/j.bh.1001-5965.2012.05.011
2012-01-01
Abstract:A new envolving data stream clustering algorithm with fault-tolerance characteristic was proposed named FTGDStream(fault-tolerant grid-density clustering over data stream).It introduces appropriate relaxation of conditions for discover generalised knowledge in real world data polluted by noise.First,FTGDStream uses similarity measure technology and lifting wavelet to construct synopsis HLSFTS(hierarchical lifting scheme fault-tolerant synopses) to realize online micro-cluster phase.Second,FTGDStream uses grid-density clustering technology to realize offline macro-cluster phase.High compression ratio of HLSFTS in micro-cluster reduces the computation load of grid-density clustering algorithm in macro-cluster and improves the efficiency of two-layer algorithm.Simulation in UCI data set proves that FTGDStream is able to clustering any shape in data space and suitable for dealing with high-dimensional data streams.FTGDStream is an efficient clustering algorithm with fault-tolerance.
What problem does this paper attempt to address?