A self-adaptive and robust fission clustering algorithm via heat diffusion and maximal turning angle

Yu Han,Shizhan Lu,Haiyan Xu
DOI: https://doi.org/10.48550/arXiv.2102.03794
2021-02-07
Abstract:Cluster analysis, which focuses on the grouping and categorization of similar elements, is widely used in various fields of research. A novel and fast clustering algorithm, fission clustering algorithm, is proposed in recent year. In this article, we propose a robust fission clustering (RFC) algorithm and a self-adaptive noise identification method. The RFC and the self-adaptive noise identification method are combine to propose a self-adaptive robust fission clustering (SARFC) algorithm. Several frequently-used datasets were applied to test the performance of the proposed clustering approach and to compare the results with those of other algorithms. The comprehensive comparisons indicate that the proposed method has advantages over other common methods.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges faced by existing clustering algorithms when dealing with data sets obtained from automated monitoring devices. Specifically, these problems include: 1. **No need for parameter setting**: Many existing clustering algorithms require users to preset some parameters in advance. However, for data sets obtained through automated monitoring devices, we usually cannot know in advance how to set these parameters. Therefore, a clustering algorithm that can adaptively process data is required. 2. **Ability to handle extremely challenging data sets**: Some existing clustering algorithms perform poorly when dealing with data sets with large density differences, complex shapes, or noise. For example, some density - based algorithms may not be able to correctly identify low - density clusters or misidentify outliers as separate clusters. 3. **Improve computational efficiency**: For large - scale data sets, computational efficiency is an important consideration. Traditional clustering algorithms may encounter performance bottlenecks when dealing with large - scale data, so an algorithm that can improve computational efficiency while ensuring accuracy is required. To solve the above problems, the author proposes a Self - Adaptive and Robust Fission Clustering (SARFC) algorithm. This algorithm combines Fission Clustering (FC) and an adaptive noise identification method, aiming at: - **No need for parameter setting**: The SARFC algorithm can automatically process various data sets without any input parameters. - **Handle complex data sets**: By introducing the heat diffusion density factor and the maximum turning angle method, the SARFC algorithm can better handle clusters with large density differences and complex shapes. - **Improve computational efficiency**: The SARFC algorithm significantly improves the efficiency of processing large - scale data sets through the divide - and - conquer strategy and parallel computing. In conclusion, this paper aims to develop a clustering algorithm that is self - adaptive, robust, and efficient to meet the challenges brought by complex data sets obtained from automated monitoring devices.