Graph Representation Learning via Contrasting Cluster Assignments

Chunyang Zhang,Hongyu Yao,C. L. Philip Chen,Yuena Lin
DOI: https://doi.org/10.48550/arXiv.2112.07934
2021-12-15
Abstract:With the rise of contrastive learning, unsupervised graph representation learning has been booming recently, even surpassing the supervised counterparts in some machine learning tasks. Most of existing contrastive models for graph representation learning either focus on maximizing mutual information between local and global embeddings, or primarily depend on contrasting embeddings at node level. However, they are still not exquisite enough to comprehensively explore the local and global views of network topology. Although the former considers local-global relationship, its coarse global information leads to grudging cooperation between local and global views. The latter pays attention to node-level feature alignment, so that the role of global view appears inconspicuous. To avoid falling into these two extreme cases, we propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA. It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning. This not only facilitates the contrastive effect, but also provides the more high-quality graph information. Meanwhile, GRCCA further excavates cluster-level information, which make it get insight to the elusive association between nodes beyond graph topology. Specifically, we first generate two augmented graphs with distinct graph augmentation strategies, then employ clustering algorithms to obtain their cluster assignments and prototypes respectively. The proposed GRCCA further compels the identical nodes from different augmented graphs to recognize their cluster assignments mutually by minimizing a cross entropy loss. To demonstrate its effectiveness, we compare with the state-of-the-art models in three different downstream tasks. The experimental results show that GRCCA has strong competitiveness in most tasks.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing contrastive learning models fail to fully explore the relationship between local and global views in unsupervised graph representation learning. Specifically: 1. **Insufficiency of Local - Global Relationship**: Existing contrastive learning models either focus on maximizing the mutual information between local and global embeddings or mainly rely on node - level contrast. The former has limited cooperation between local and global views due to coarse global information; the latter pays too much attention to node - level feature alignment, making the role of the global view not obvious. 2. **Lack of Balanced Aggregation Method**: These methods are either biased towards the local view or the global view, but none of them can maintain a good balance between the two to comprehensively explore the local and global views of network topology. To solve these problems, the author proposes a new unsupervised graph representation learning model - Graph Representation via Contrasting Cluster Assignments (GRCCA). This model aims to capture more fine - grained global information (cluster - level) by using clustering algorithms and deeply understand the hidden associations between nodes, going beyond the graph topology. In addition, GRCCA also contrasts embeddings at the node level to preserve the quality of local information, but elegantly explores global information by enforcing cluster - level consistency rather than node - level consistency. ### Main Contributions 1. **Novel Graph Representation Learning Model**: For the first time, the contrastive cluster assignment mechanism is adopted as an unsupervised learning method. 2. **Balance Global and Local Views**: By combining clustering algorithms and contrastive learning, GRCCA achieves a balance between global and local views, which not only promotes the contrast effect but also provides higher - quality graph information. 3. **Mine Cluster - Level Information**: Further mine cluster - level information in the feature space to understand the abstract similarity between nodes, which reveals the essential patterns of the graph and generalizes the proximity hypothesis and self - identification. 4. **Verify Effectiveness by Experiments**: The experimental results on three different tasks show the effectiveness of GRCCA. In the node classification task, GRCCA outperforms the latest unsupervised models and even exceeds some supervised models. It also shows sufficient competitiveness in the link prediction and community detection tasks and achieves the best results on some datasets. ### Mathematical Formulas To ensure the correctness and readability of the formulas, the following are some key formulas involved in the paper: - **Graph Diffusion Matrix**: \[ S=\sum_{k = 0}^{\infty}\theta_kT^k \] where \(T\in\mathbb{R}^{N\times N}\) is the generalized transition matrix and \(\theta_k\) is the weight coefficient. - **Personalized PageRank (PPR) Kernel**: \[ S=\alpha\left(I-(1 - \alpha)D^{-1/2}AD^{-1/2}\right)^{-1} \] where \(I\in\mathbb{R}^{N\times N}\) is the identity matrix and \(\alpha\in(0,1)\) is the jump probability of the random walk. - **Contrastive Loss Function**: \[ \ell(q_{vi},z_{ui})=-q_{vi}\log p_{ui} \] \[ p_{ui}=\text{softmax}\left(\frac{z_{ui}C_v^T}{\tau}\right) \] where \(p_{ui}\) is the predicted clustering label of the same node from the first augmented graph and \(\tau\) is the temperature parameter. - **Total Contrastive Loss**: \[ L_c=\frac{1}{N}\sum_{i = 0}^{N}[\ell(q_{vi},z_{ui})+\ell(q_{ui},z_{vi})] \] Through these formulas, GRCCA can effectively perform contrast among multiple perspectives.