Adaptive Gravitational Clustering Algorithm Integrated with Noise Detection
Juntao Yang,Lijun Yang,Wentong Wang,Tao Liu,Dongming Tang
DOI: https://doi.org/10.1016/j.eswa.2024.125733
IF: 8.5
2024-01-01
Expert Systems with Applications
Abstract:Clustering analysis is frequently used in data mining, image processing, artificial intelligence, and so on. Traditional approaches heavily rely on manually configured parameters, of which the initial selection exerts a profound influence on the clustering outcomes. In addition, they usually only consider the relationship between two individual samples when calculating distances, neglecting the overall structure of the dataset, which can negatively affect clustering performance. At the same time, many contemporary algorithms are tailored to specific datasets, posing challenges in achieving optimal clustering performance for intricate, noisy datasets. To address these limitations, we propose an Adaptive Gravitational Clustering Algorithm Integrated with Noise Detection called GCIND. Inspired by the law of gravitation, GCIND takes into account the natural neighborhood structure of the entire dataset, adaptively computing the gravitation between data points by leveraging shared neighbors and Euclidean distance relationships. Our algorithm initially identifies and eliminates outliers or edge points in the dataset. It subsequently uses gravitation to autonomously cluster the remaining core data. Finally, the removed data are reallocated to their respective clusters. GCIND has four notable advantages: (1) it uses gravitation to build the neighborhood graph, reflecting the overall dataset structure; (2) it demonstrates stronger robustness in handling noisy datasets; (3) it uses adaptive gravitational neighborhood graph clustering, removing manual parameter tuning; (4) it adapts to complex manifold-structured datasets, offering broad applicability. Experiments have shown that GCIND, without requiring any parameter settings, demonstrates slightly better performance than the algorithms compared in the study, especially when dealing with complex manifold datasets.