A Novel Stratification Clustering Algorithm Based on a New Local Density Estimation Method and an Improved Local Inter-Cluster Distance Measure
Jianfang Qi,Yue Li,Haibin Jin,Jianying Feng,Dong Tian,Weisong Mu
DOI: https://doi.org/10.1007/s13042-023-01893-8
2023-01-01
International Journal of Machine Learning and Cybernetics
Abstract:Recently clustering for datasets with different shapes, densities and noises has attracted more and more attention from scholars. However, most current clustering algorithms improve the clustering performance at the expense of the simplicity, and cannot balance well between the clustering quality and the operability for the users. To solve this problem, we propose a new algorithm called stratification clustering based on density, hierarchy and partition (SDHP) by effectively integrating the advantages of the density-based, hierarchical-based and partition-based clustering. First, a new parameter-free local density estimation strategy based on the bidirectional natural neighbor relationship named local density based on natural neighbor (NN-LD) is proposed to identify the core part of each sub-cluster. Then, a new stratification strategy based on the NN-LD Stratification-NN-LD (S-NN-LD) is proposed to divide the entire dataset into two layers, the core layer and the edge layer, to simplify the dataset structure and make the algorithm robust to noises. Next, the hierarchical-based single-linkage algorithm is adopted in the core layer to obtain the initial clustering result since it has advantages on clustering the datasets with various shapes and densities. Finally, to improve the clustering accuracy of samples in the edge layer, a combination of a new local inter-cluster distance measure based on the average of neighbor distances and the partitioning clustering is adopted to match these samples to the sub-clusters in the initial clustering result. The experiments on twenty datasets show that the SDHP has better clustering accuracy, and can be applied in practice well compared with four popular hierarchical clustering algorithms, four recent density-based clustering algorithms, and a state-of-the-art partitioning clustering algorithm. The source code can be downloaded from https:// github.com/qi111678/SDHP.