A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce

Xikang Wang,Tongxi Wang,Hua Xiang
DOI: https://doi.org/10.1007/s10586-024-04456-w
2024-04-08
Cluster Computing
Abstract:The particle swarm optimization-K-Means algorithm is proposed by the related researchers to improve the clustering accuracy of the K-Means algorithm. However, the particle swarm optimization-K-Means algorithm brings more burden to the computation, and the computational efficiency is low when dealing with large data sets. To solve this problem, a parallel particle swarm K-Means algorithm based on MapReduce with multi-threading is proposed. The algorithm performs parallel computation by dividing the particle swarm into several equal-sized sub-populations based on the number of available nodes in the cluster and distributing them to each node. It uses a multi-threaded execution in the evaluation stage, which has the highest computational complexity in the evolutionary process. Experiments show that although splitting the population will affect the optimization effect to some extent, the proposed still can effectively optimize the clustering results of the K-Means algorithm, and the computational efficiency is significantly improved compared with serial particle swarm optimization k-means algorithm and MapReduce-based non-multithreaded particle swarm optimization k-means algorithm, in the experiment with the largest dataset and a configuration of 16 nodes, the proposed algorithm is 58 times faster than the serial algorithm. Furthermore, the computing efficiency can be improved in the clusters with more CPU cores.
computer science, information systems, theory & methods
What problem does this paper attempt to address?