Abstract:Spatial clustering is one of the most important methods in spatial data mining. As a common but powerful spatial clustering algorithm, K-Medoids is applied in many fields such as generalization of spatial entity information, spatial point pattern analysis and epidemiology application. However, K-Medoids algorithm meets two main challenges innately as follow. At first, K-Medoids has selection problem of the initial medoids. Different initial medoids may not attain the same clustering results which could lead to a non-optimal results sometimes. Furthermore, time efficiency of the algorithm is not satisfactory because there exist quantities of iterations to find the most suitable partition. Existing studies on the K-Medoids algorithm don't take the validness and time efficiency into consideration at the same time. Optimal methods like the Genetic Algorithm are applied to improve the validness of K-Medoids but the time efficiency is not acceptable when dealing with growing data. The MapReduce model is utilized to handle with data of high volume which can't adapt to some circumstances short of computer clusters. In order to improve the result validity and time efficiency of the algorithm, this paper revised the traditional K-Medoids algorithm of Partitioning Around Medoids (PAM) combining with the idea of the Simulate Anneal Arithmetic (SAA) and proposed a parallel Simulate Anneal Partitioning Around Medoids (SAPAM) algorithm which was implemented efficiently in Graphics Processing Units (GPUs). SAA algorithm is used to search for the initial medoids which promises the validness of the algorithm. The stochastic factor introduced in SAA algorithm gives the possibility of eliminating the local optima to attain the global optimal clustering results of PAM. To accelerate the clustering process, we design the parallel SAPAM algorithm to utilize quantities of GPU's threads which execute the program at the same time. By analogy with the matrix multiplication, a new matrix computation method is defined to reduce the time consumption of data transfer between GPU's global memory and shared memory. The matrix computation method reuses data in the shared memory of GPU and computes the distances between medoids and many points at a time which improve the algorithm's performance evidently. To validate the proposed algorithm, we generated eight datasets with different attributes and sizes randomly and conducted experiments on the eight datasets to compare the proposed parallel SAPAM algorithm with the traditional PAM algorithm, sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm. The experiment results showed that SAPAM algorithm attained more accurate clustering results compared with the traditional PAM and the parallel genetic K-Medoids algorithm. Besides, the proposed algorithm performed better than the sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm in time efficiency. According to the results, our GPU-based SAPAM algorithm was four to eight times faster than the traditional PAM algorithm. The results demonstrate that the proposed method can execute efficiently and attain a valid result. Finally, SAPAM algorithm was applied to analyze the safety monitoring data of Guizhou province to get the clustering pattern of the safety threats. The clustering results show us several clusters of the safety threats which may provide some practical application value to the governor.

BanditPAM++: Faster $k$-medoids Clustering

K-Medoids Clustering Algorithm Based on Distance Inequality

ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits

Careful seeding for the k-medoids algorithm with incremental k++ cluster construction

Research and Application of Accelerating Improved PAM Clustering Algorithm by GPU

Improved Outlier Robust Seeding for k-means

Subspace Clustering by Directly Solving Discriminative K-means

An Improved K-medoids Algorithm Based on Step Increasing and Optimizing Medoids

PAM Spatial Clustering Algorithm Research Based on CUDA

Fast Clustering using MapReduce

Optimal Clustering with Bandit Feedback

A Scalable Algorithm for Individually Fair K-means Clustering

Block-Based K-Medoids Partitioning Method with Standardized Data to Improve Clustering Accuracy

A Novel Density Based Clustering Algorithm and Its Parallelization.

A Faster $k$-means++ Algorithm

Bilateral k-Means Algorithm for Fast Co-Clustering.

Fully Scalable MPC Algorithms for Clustering in High Dimension

Optimal Time Bounds for Approximate Clustering

Simple, Scalable and Effective Clustering via One-Dimensional Projections

Fast Density Peaks Clustering Algorithm Based on Improved Mutual K-nearest-neighbor and Sub-cluster Merging

Faster K-Means Cluster Estimation