Supervised Adaptive Incremental Clustering for Data Stream of Chunks

Laiwen Zheng,Hong Huo,Yiyou Guo,Tao Fang
DOI: https://doi.org/10.1016/j.neucom.2016.09.054
IF: 6
2016-01-01
Neurocomputing
Abstract:Many supervised clustering algorithms have been developed to find the optimal clusters for static datasets by presetting some parameters, but they are seldom suitable for dynamic datasets, such as the data stream of chunks. To find the optimal clusters of the data stream of chunks, a novel Supervised Adaptive Incremental Clustering (SAIC) algorithm is proposed. SAIC can cluster dynamic datasets of arbitrary shapes and sizes automatically. It includes learning and post-processing phases. In the learning phase, each cluster updates adaptively according to its learning rate that is calculated from its counter value. All data points are shuffled at each iteration in order to make SAIC insensitive to the input order of data points. In the post-processing phase, the outliers or boundary points are eliminated according to the counter value of each cluster and the number of iterations. Four synthetic datasets and fourteen UCI datasets are used to evaluate the performance of SAIC, respectively. The experiments on UCI datasets show that SAIC reaches to or outperforms some other supervised clustering algorithms and several unsupervised incremental clustering algorithms. In addition, three data stream of chunks are used to evaluate SAIC from different aspects, which shows SAIC has the scalability and incremental learning ability for the clustering of data streams of chunks.
What problem does this paper attempt to address?