Self-supervised Contrastive Attributed Graph Clustering

Wei Xia,Quanxue Gao,Ming Yang,Xinbo Gao
DOI: https://doi.org/10.1109/TMM.2022.3213208
2021-10-15
Abstract:Attributed graph clustering, which learns node representation from node attribute and topological graph for clustering, is a fundamental but challenging task for graph analysis. Recently, methods based on graph contrastive learning (GCL) have obtained impressive clustering performance on this task. Yet, we observe that existing GCL-based methods 1) fail to benefit from imprecise clustering labels; 2) require a post-processing operation to get clustering labels; 3) cannot solve out-of-sample (OOS) problem. To address these issues, we propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC). In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, which aims to maximize the similarities of intra-cluster nodes while minimizing the similarities of inter-cluster nodes, are designed for node representation learning. Meanwhile, a clustering module is built to directly output clustering labels by contrasting the representation of different clusters. Thus, for the OOS nodes, SCAGC can directly calculate their clustering labels. Extensive experimental results on four benchmark datasets have shown that SCAGC consistently outperforms 11 competitive clustering methods.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the three main limitations of existing graph contrastive learning (GCL) - based methods in the attributed graph clustering task: 1. **Unable to benefit from imprecise clustering labels**: Existing GCL methods fail to fully utilize inaccurate clustering labels, resulting in poor performance. 2. **Require post - processing operations to obtain clustering labels**: These methods usually need additional steps to generate the final clustering labels, which may lead to sub - optimal node representations. 3. **Unable to solve the out - of - sample (OOS) problem**: Existing methods cannot directly handle unseen nodes, limiting their application in practical engineering. To solve these problems, the authors propose a new self - supervised contrastive attributed graph clustering network (Self - supervised Contrastive Attributed Graph Clustering, SCAGC). The main improvements of SCAGC include: - **Utilizing imprecise clustering labels**: By designing a self - supervised contrastive loss function, maximize the similarity between nodes within the same cluster and minimize the similarity between nodes in different clusters. - **Directly outputting clustering labels**: Construct a clustering module that directly outputs clustering labels by comparing the representations of different clusters. - **Handling out - of - sample nodes**: For out - of - sample nodes, SCAGC can directly calculate their clustering labels without retraining the entire graph. ### Formula Summary 1. **Node Representation Learning Module**: \[ Z^{(v)} = P(X^{(v)}, G^{(v)}|\Omega_1)=\sigma(\tilde{D}^{-\frac{1}{2}}(v)\tilde{G}(v)\tilde{D}^{-\frac{1}{2}}(v)X^{(v)}\Omega_1) \] \[ Z^{(v)} = P(Z^{(v)}, G^{(v)}|\Omega_2) \] 2. **Self - supervised Contrastive Loss**: \[ L_i = -\frac{1}{|\Delta_i|}\sum_{t\in\Delta_i}\sum_{\alpha,\beta = 1}^2\log\frac{e(\mathcal{S}(m_i^{(\alpha)},m_t^{(\beta)})/\tau_2)}{\sum_{\alpha',\beta' = 1}^2\sum_{q\in\nabla_i}e(\mathcal{S}(m_i^{(\alpha')},m_q^{(\beta')})/\tau_2)} \] \[ L_{SGC}=\min_{\Omega,\phi}\sum_{i = 1}^N L_i \] 3. **Contrastive Clustering Loss**: \[ L(\hat{\ell}_k^{(1)},\hat{\ell}_k^{(2)})=-\log\frac{e(\mathcal{S}(\hat{\ell}_k^{(1)},\hat{\ell}_k^{(2)})/\tau_1)}{\sum_{j = 1}^K e(\mathcal{S}(\hat{\ell}_k^{(1)},\hat{\ell}_j^{(1)})/\tau_1)+\sum_{j = 1}^K e(\mathcal{S}(\hat{\ell}_k^{(1)},\hat{\ell}_j^{(2)})/\tau_1)} \] \[ L_{CC}=\min_{\Omega,\psi}\frac{1}{2K}\sum_{k = 1}^K[L(\hat{\ell}_k^{(1)},\hat{\ell}_k^{(2)})+L(\hat{\ell}