Parallel Gravitational Clustering Based on Grid Partitioning for Large-Scale Data

Chen Lei,Chen Fadong,Liu Zhaohua,Lv Mingyang,He Tingqin,Zhang Shiwen
DOI: https://doi.org/10.1007/s10489-022-03661-7
IF: 5.3
2023-01-01
Applied Intelligence
Abstract:The gravitational clustering algorithm is a dynamic clustering model that achieves outstanding performance in uncovering the hidden clusters of a complex dataset with any shape, density and distribution. This algorithm is very suitable for mining irregular and unbalanced clusters from large-scale datasets with noise. However, the unbearable time overhead makes this algorithm ineffective to apply at large scales. Therefore, a parallel gravitational clustering algorithm based on grid partitioning (PGCGP) is developed in this paper. First, a grid partitioning strategy is designed to divide a large-scale dataset into multiple grids as evenly as possible. Second, a neighbourhood repair strategy is proposed to work with the gravitational clustering algorithm to accurately mine the clusters of a single grid. Finally, a border point alignment strategy is devised to determine whether to merge two small clusters located in different grids to discover the real clusters of the original large dataset by merging multiple grids. Extensive experiments on multiple artificial and real-world datasets verify that our PGCGP approach achieves good performance.
What problem does this paper attempt to address?