Enhanced Synchronization-Inspired Clustering for High-Dimensional Data

Chen Lei,Guo Qinghua,Liu Zhaohua,Zhang Shiwen,Zhang Hongqiang
DOI: https://doi.org/10.1007/s40747-020-00191-y
IF: 6.7
2020-01-01
Complex & Intelligent Systems
Abstract:The synchronization-inspired clustering algorithm (Sync) is a novel and outstanding clustering algorithm, which can accurately cluster datasets with any shape, density and distribution. However, the high-dimensional dataset with high dimensionality, high noise, and high redundancy brings some new challenges for the synchronization-inspired clustering algorithm, resulting in a significant increase in clustering time and a decrease in clustering accuracy. To address these challenges, an enhanced synchronization-inspired clustering algorithm, namely SyncHigh, is developed in this paper to quickly and accurately cluster the high-dimensional datasets. First, a PCA-based (Principal Component Analysis) dimension purification strategy is designed to find the principal components in all attributes. Second, a density-based data merge strategy is constructed to reduce the number of objects participating in the synchronization-inspired clustering algorithm, thereby speeding up clustering time. Third, the Kuramoto Model is enhanced to deal with mass differences between objects caused by the density-based data merge strategy. Finally, extensive experimental results on synthetic and real-world datasets show the effectiveness and efficiency of our SyncHigh algorithm.
What problem does this paper attempt to address?