An Effective Partitional Clustering Algorithm Based on New Clustering Validity Index

Erzhou Zhu,Ruhui Ma
DOI: https://doi.org/10.1016/j.asoc.2018.07.026
IF: 8.7
2018-01-01
Applied Soft Computing
Abstract:As an unsupervised pattern classification method, clustering partitions the input datasets into groups or clusters. It plays an important role in identifying the natural structure of the target datasets. Now, it has been widely used in data mining, pattern recognition, image processing and so on. However, due to different settings of the parameters and random selection of initial centers, traditional clustering algorithms may produce different clustering partitions for a single dataset. Clustering validity index (CVI) is an important method for evaluating the effect of clustering results generated by clustering algorithms. However, many of the existing CVIs suffer from complex computation, low time efficiency and narrow range of applications. In order to make clustering algorithms more stable, traditional K -means is firstly improved by the density parameters based initial center selection method other than randomly selecting initial centers. Then, in order to enlarge the application range of clustering and better evaluate the clustering partition results, a new variance based clustering validity index (VCVI) from the point of view of spatial distribution of datasets is designed. Finally, a new partitional clustering algorithm integrated with the improved K -means algorithm and the newly introduced VCVI is designed to optimize and determine the optimal clustering number (Kopf) for a wide range of datasets. Furthermore, the commonly used empirical rule Kmax 5.,/T -t is reasonably explained by the newly designed VCVI. The new algorithm integrated with VCVI is compared with traditional algorithms integrated with five commonly used CVIs. The experimental results show that our new clustering method is more accurate and stable while consuming relatively lower running time. (C) 2018 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?