Modularity aided consistent attributed graph clustering via coarsening

Samarth Bhatia,Yukti Makhija,Manoj Kumar,Sandeep Kumar
2024-07-09
Abstract:Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.
Machine Learning,Social and Information Networks
What problem does this paper attempt to address?
This paper mainly discusses the problem of graph clustering, especially the challenges of clustering on graphs with attributes. Existing methods face difficulties in capturing true community structures, maintaining computational efficiency, and identifying small-scale communities. To address these issues, the paper proposes a method that combines coarsening and modularity maximization, aiming to improve the accuracy of clustering by utilizing both the adjacency of the graph and the node features. The paper achieves superior clustering results by using a loss function that includes logarithmic determinants, smoothness, and modularity components, and adopting block principal diagonalization minimization technique. This method has theoretical consistency under the degree-corrected stochastic block model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. The proposed algorithm not only converges but also has high computational efficiency, allowing seamless integration with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to enhance node feature learning and achieve excellent clustering performance. Experiments show that this method outperforms existing state-of-the-art methods on benchmark datasets, regardless of whether the graph has attributes or not. Additionally, this method can explore the relationships between different clusters and provide insights into the characteristics of each cluster, which is particularly valuable for preliminary analysis of large unlabeled datasets. The main contributions of the paper include: 1. Proposing the first optimization-based framework for attribute graph clustering through coarsening and modularity maximization. 2. Theoretically proving the weak and strong consistency of the method under DC-SBM, ensuring asymptotic error-free performance and complete label recovery. 3. Demonstrating the integration with GNN architecture to enhance clustering performance. 4. Validating its superiority on various real-world and synthetic datasets through extensive experiments. In summary, this paper introduces a novel graph clustering method that improves the clustering effectiveness on graphs with attributes by integrating coarsening and modularity maximization, addressing some limitations of existing methods, and providing theoretical guarantees and empirical evidence supporting its effectiveness.