MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion

Dehua Peng,Zhipeng Gui,Huayi Wu
2023-12-07
Abstract:As the most typical graph clustering method, spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability. Classical spectral clustering measures the edge weights of graph using pairwise Euclidean-based metric, and solves the optimal graph partition by relaxing the constraints of indicator matrix and performing Laplacian decomposition. However, Euclidean-based similarity might cause skew graph cuts when handling non-spherical data distributions, and the relaxation strategy introduces information loss. Meanwhile, spectral clustering requires specifying the number of clusters, which is hard to determine without enough prior knowledge. In this work, we leverage the path-based similarity to enhance intra-cluster associations, and propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition. This algorithm enables the identification of arbitrary shaped clusters and is robust to noise. To reduce the computational complexity of similarity calculation, we transform optimal path search into generating the maximum spanning tree (MST), and develop a fast MST (FastMST) algorithm to further improve its time-efficiency. Moreover, we define a density gradient factor (DGF) for separating the weakly connected clusters. The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition. The source code of MeanCut is available at <a class="link-external link-https" href="https://github.com/ZPGuiGroupWhu/MeanCut-Clustering" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
The paper aims to address several key issues in graph clustering: 1. **Information Loss**: Traditional spectral clustering methods simplify the problem by relaxing constraints (i.e., using auxiliary clustering algorithms like K-means), but this leads to information loss. 2. **Non-Spherical Data Handling**: Similarity measures based on Euclidean distance may result in biased cuts when dealing with non-spherical data distributions. 3. **Presetting the Number of Clusters**: Spectral clustering requires the number of clusters to be predetermined, which is often difficult in the absence of prior knowledge. 4. **Noise Sensitivity**: Traditional spectral clustering methods are quite sensitive to noise points. To address these issues, the authors propose a new graph clustering algorithm called MeanCut, which performs greedy optimization based on path similarity and vertex degree decrement criteria. Specifically, the algorithm uses path similarity to enhance intra-cluster association and performs non-destructive graph partitioning in the order of vertex degree decrement, thereby avoiding the aforementioned problems. Additionally, the algorithm introduces a Fast Maximum Spanning Tree algorithm (FastMST) to improve time efficiency and defines a Density Gradient Factor (DGF) to separate weakly connected clusters. These improvements enable the MeanCut algorithm to identify clusters of arbitrary shapes and exhibit good robustness to noise.