Effective Clustering Analysis Based on New Designed CVI and Improved Clustering Algorithms

Erzhou Zhu,Binbin Zhu,Peng Wen,Feng Liu,Xuejun Li,Futian Wang
DOI: https://doi.org/10.1109/BDCloud.2018.00115
2018-01-01
Abstract:Due to different settings of the parameters and random selection of initial clustering centers, the traditional K-means algorithm is not stable. Clustering validity index (CVI) is an important method for evaluating the effect of clustering results generated by clustering algorithms. However, many of the existing CVIs suffer from instability, narrow range of applications and cannot properly process datasets with non-spherical distribution and datasets with a large number of overlapping points. Aiming at these problems, the traditional K-means algorithm is firstly improved by utilizing the dynamic average distance to find the initial clustering centers rather than selecting them randomly. Then, based on the idea of dynamic average distance, a new clustering validity index, DCVI, is proposed. The new DCVI is able to deal with many kinds of datasets includes non-convex datasets and datasets with a large number of overlapping points. Thirdly, by integrating the improved K-means algorithm with the new DCVI, a new algorithm (KVOA) is designed to optimize and determine the optimal clustering number (Kopt) for a wide range of datasets. The experimental results on testing several datasets have demonstrated that the improved K-means algorithm is more accurately and stably than the traditional ones. Meanwhile, the new DCVI is compared with six commonly used CVIs. The experimental results show that our new DCVI is more accurately and stably than the other CVIs.
What problem does this paper attempt to address?