Abstract:Spatial clustering is one of the most important methods in spatial data mining. As a common but powerful spatial clustering algorithm, K-Medoids is applied in many fields such as generalization of spatial entity information, spatial point pattern analysis and epidemiology application. However, K-Medoids algorithm meets two main challenges innately as follow. At first, K-Medoids has selection problem of the initial medoids. Different initial medoids may not attain the same clustering results which could lead to a non-optimal results sometimes. Furthermore, time efficiency of the algorithm is not satisfactory because there exist quantities of iterations to find the most suitable partition. Existing studies on the K-Medoids algorithm don't take the validness and time efficiency into consideration at the same time. Optimal methods like the Genetic Algorithm are applied to improve the validness of K-Medoids but the time efficiency is not acceptable when dealing with growing data. The MapReduce model is utilized to handle with data of high volume which can't adapt to some circumstances short of computer clusters. In order to improve the result validity and time efficiency of the algorithm, this paper revised the traditional K-Medoids algorithm of Partitioning Around Medoids (PAM) combining with the idea of the Simulate Anneal Arithmetic (SAA) and proposed a parallel Simulate Anneal Partitioning Around Medoids (SAPAM) algorithm which was implemented efficiently in Graphics Processing Units (GPUs). SAA algorithm is used to search for the initial medoids which promises the validness of the algorithm. The stochastic factor introduced in SAA algorithm gives the possibility of eliminating the local optima to attain the global optimal clustering results of PAM. To accelerate the clustering process, we design the parallel SAPAM algorithm to utilize quantities of GPU's threads which execute the program at the same time. By analogy with the matrix multiplication, a new matrix computation method is defined to reduce the time consumption of data transfer between GPU's global memory and shared memory. The matrix computation method reuses data in the shared memory of GPU and computes the distances between medoids and many points at a time which improve the algorithm's performance evidently. To validate the proposed algorithm, we generated eight datasets with different attributes and sizes randomly and conducted experiments on the eight datasets to compare the proposed parallel SAPAM algorithm with the traditional PAM algorithm, sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm. The experiment results showed that SAPAM algorithm attained more accurate clustering results compared with the traditional PAM and the parallel genetic K-Medoids algorithm. Besides, the proposed algorithm performed better than the sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm in time efficiency. According to the results, our GPU-based SAPAM algorithm was four to eight times faster than the traditional PAM algorithm. The results demonstrate that the proposed method can execute efficiently and attain a valid result. Finally, SAPAM algorithm was applied to analyze the safety monitoring data of Guizhou province to get the clustering pattern of the safety threats. The clustering results show us several clusters of the safety threats which may provide some practical application value to the governor.

GPARS: Graph predictive algorithm for efficient resource scheduling in heterogeneous GPU clusters

Online Scheduling of Mixed CPU-GPU Jobs

Online Scheduling on a CPU-GPU Cluster

PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters

Optimizing Resource Allocation for Data-Parallel Jobs Via GCN-Based Prediction

GraphPar: Efficient Workload-Aware Subgraph Matching System on Multiple GPUs

Astraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU Clusters

PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters

GPU based parallel genetic algorithm for solving an energy efficient dynamic flexible flow shop scheduling problem

Energy-aware application scheduling based on genetic algorithm.

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Novel parallel hybrid genetic algorithms on the GPU for the generalized assignment problem

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks With Fine-Grain Utilization

Exploring the Diversity of Multiple Job Deployments over GPUs for Efficient Resource Sharing

Task Scheduling for GPU Heterogeneous Cluster.

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

An Energy Efficient Task Scheduling Scheme for Heterogeneous GPU-enhanced Clusters

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks

Task Scheduling Greedy Heuristics for GPU Heterogeneous Cluster Involving the Weights of the Processor

Research and Application of Accelerating Improved PAM Clustering Algorithm by GPU

Prediction Model for Scheduling an Irregular Graph Algorithms on CPU–GPU Hybrid Cluster Framework