Dvt-Pkm: An Improved Gpu Based Parallel K-Means Algorithm

Bo Yan,Ye Zhang,Zijiang Yang,Hongyi Su,Hong Zheng
DOI: https://doi.org/10.1007/978-3-319-09339-0_60
2014-01-01
Abstract:K-Means clustering algorithm is a typical partition-based clustering algorithm. Its two major disadvantages lie in the facts that the algorithm is sensitive to initial cluster centers and the outliers exert significant influence on the clustering results. In addition, K-Means algorithm traverses and computes all the data multiple times. Thus, the algorithm is not efficient when dealing with large data sets. In order to overcome the above limitations, this paper proposes to exclude the outliers using the minimum number of points in the d-dimensional hypersphere area. Then k cluster centers can be obtained by adjusting the threshold making use of density idea. Finally, K-Means algorithm will be integrated with Compute Unified Device Architecture (CUDA). The time efficiency is improved considerably through taking advantage of computing power of Graphic Processing Unit (GPU). We use the ratio of distance between classes to distance within classes and speedup as the evaluation criteria. The experiments indicate that the proposed algorithm significantly improves the stability and running efficiency of K-Means algorithm.
What problem does this paper attempt to address?