Abstract:Like k-means and Gaussian Mixture Model (GMM), fuzzy c-means (FCM) with soft partition has also become a popular clustering algorithm and still is extensively studied. However, these algorithms and their variants still suffer from some difficulties such as determination of the optimal number of clusters which is a key factor for clustering quality. A common approach for overcoming this difficulty is to use the trial-and-validation strategy, i.e., traversing every integer from large number like $\sqrt{n}$ to 2 until finding the optimal number corresponding to the peak value of some cluster validity index. But it is scarcely possible to naturally construct an adaptively agglomerative hierarchical cluster structure as using the trial-and-validation strategy. Even possible, existing different validity indices also lead to different number of clusters. To effectively mitigate the problems while motivated by convex clustering, in this paper we present a Centroid Auto-Fused Hierarchical Fuzzy c-means method (CAF-HFCM) whose optimization procedure can automatically agglomerate to form a cluster hierarchy, more importantly, yielding an optimal number of clusters without resorting to any validity index. Although a recently-proposed robust-learning fuzzy c-means (RL-FCM) can also automatically obtain the best number of clusters without the help of any validity index, so-involved 3 hyper-parameters need to adjust expensively, conversely, our CAF-HFCM involves just 1 hyper-parameter which makes the corresponding adjustment is relatively easier and more operational. Further, as an additional benefit from our optimization objective, the CAF-HFCM effectively reduces the sensitivity to the initialization of clustering performance. Moreover, our proposed CAF-HFCM method is able to be straightforwardly extended to various variants of FCM.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the difficulties faced by the Fuzzy C - Means (FCM) clustering algorithm and its variants in determining the optimal number of clusters in data. Specifically, the paper points out that traditional clustering algorithms such as K - means, Gaussian Mixture Model (GMM) and FCM have the following challenges in determining the optimal number of clusters: 1. **Difficulty in determining the optimal number of clusters**: These algorithms usually need to pre - specify the number of clusters, and this parameter is crucial for the quality of clustering. If it is not properly selected, it may lead to poor clustering results. 2. **Inefficiency of relying on validation strategies**: The commonly - used trial - and - validation strategy needs to traverse a series of possible numbers of clusters until the maximum value of a certain validity index is found. This method is not only computationally costly, but also different validity indices may lead to different numbers of clusters. 3. **Lack of adaptive hierarchical structure**: Existing methods are difficult to construct an adaptive hierarchical clustering structure naturally, which limits the in - depth understanding of the data structure. To solve these problems, the paper proposes a new method - Centroid Auto - Fused Hierarchical Fuzzy c - Means (CAF - HFCM). The main features of this method are as follows: - **Automatically determine the optimal number of clusters**: By automatically fusing cluster centers during the optimization process, CAF - HFCM can determine the optimal number of clusters without relying on any validity index. - **Form a hierarchical clustering structure**: During the optimization process, as the regularization parameter gradually increases, a hierarchical clustering structure can be generated naturally, providing data clustering interpretations at different granularities. - **Reduce initialization sensitivity**: Compared with traditional FCM, CAF - HFCM has lower sensitivity to initialization, thereby improving the stability and reliability of clustering. - **Highly extensible**: CAF - HFCM can be easily extended to various variants of FCM, with high flexibility. Through these improvements, CAF - HFCM aims to improve the performance and efficiency of the clustering algorithm, especially when dealing with complex data sets.

A Centroid Auto-Fused Hierarchical Fuzzy c-Means Clustering

Fuzzy C-mean Clustering Based on Ant Algorithm

Adaptive Approach to Fuzzy Clustering

Modified Fuzzy Clustering with Segregated Cluster Centroids.

Improvement and optimization of a fuzzy C-means clustering algorithm

Improved Clustering Algorithm Based on Modified Fuzzy C-means Applied to the Traffic

Improved fuzzy c -means clustering by varying the fuzziness parameter

Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling

An Improved Clustering Algorithm for Information Granulation

FRCM: A fuzzy rough c -means clustering method

Interval-valued possibilistic fuzzy C-means clustering algorithm

A Novel FCM's Initial Parameters Acquisition Method

An equidistance index intuitionistic fuzzy c-means clustering algorithm based on local density and membership degree boundary

From Soft Clustering to Hard Clustering: A Collaborative Annealing Fuzzy $c$-means Algorithm

Fractional Derivative to Symmetrically Extend the Memory of Fuzzy C-Means

A Generalization of Distance Functions for Fuzzy c -Means Clustering With Centroids of Arithmetic Means.

A simple and fast method to determine the parameters for fuzzy c-means cluster validation

Adaptive fuzzy C-means clustering integrated with local outlier factor

Fuzzy C-Means Clustering Validity Function Based on Multiple Clustering Performance Evaluation Components

Adaptive Fuzzy Clustering Model Based on Internal Connectivity of All Data Points

Adaptive Fuzzy C-Means with Graph Embedding