Reliable Node Similarity Matrix Guided Contrastive Graph Clustering

Yunhui Liu,Xinyi Gao,Tieke He,Tao Zheng,Jianhua Zhao,Hongzhi Yin
2024-08-07
Abstract:Graph clustering, which involves the partitioning of nodes within a graph into disjoint clusters, holds significant importance for numerous subsequent applications. Recently, contrastive learning, known for utilizing supervisory information, has demonstrated encouraging results in deep graph clustering. This methodology facilitates the learning of favorable node representations for clustering by attracting positively correlated node pairs and distancing negatively correlated pairs within the representation space. Nevertheless, a significant limitation of existing methods is their inadequacy in thoroughly exploring node-wise similarity. For instance, some hypothesize that the node similarity matrix within the representation space is identical, ignoring the inherent semantic relationships among nodes. Given the fundamental role of instance similarity in clustering, our research investigates contrastive graph clustering from the perspective of the node similarity matrix. We argue that an ideal node similarity matrix within the representation space should accurately reflect the inherent semantic relationships among nodes, ensuring the preservation of semantic similarities in the learned representations. In response to this, we introduce a new framework, Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC), which estimates an approximately ideal node similarity matrix within the representation space to guide representation learning. Our method introduces node-neighbor alignment and semantic-aware sparsification, ensuring the node similarity matrix is both accurate and efficiently sparse. Comprehensive experiments conducted on $8$ real-world datasets affirm the efficacy of learning the node similarity matrix and the superior performance of NS4GC.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of how to effectively utilize similarity information between nodes in graph clustering. Specifically, existing contrastive learning methods have a significant limitation in graph clustering, which is their failure to fully explore the similarity between nodes. Many methods assume that the node similarity matrix is the same in the representation space, ignoring the inherent semantic relationships between nodes. This leads to the failure of node representations to well preserve semantic similarity during the learning process. To address this issue, the authors propose a new framework—Reliable Node Similarity Matrix Guided Contrastive Graph Clustering (NS4GC). This framework aims to estimate a near-ideal node similarity matrix and use it to guide the representation learning process. By introducing node-neighbor alignment and semantic-aware sparsification techniques, NS4GC ensures that the node similarity matrix is both accurate and efficiently sparse. Experimental results show that this method performs excellently on multiple real-world datasets, effectively learning the node similarity matrix and achieving superior performance in graph clustering tasks.