Learning Uniform Clusters on Hypersphere for Deep Graph-level Clustering

Mengling Hu,Chaochao Chen,Weiming Liu,Xinyi Zhang,Xinting Liao,Xiaolin Zheng
2023-11-23
Abstract:Graph clustering has been popularly studied in recent years. However, most existing graph clustering methods focus on node-level clustering, i.e., grouping nodes in a single graph into clusters. In contrast, graph-level clustering, i.e., grouping multiple graphs into clusters, remains largely unexplored. Graph-level clustering is critical in a variety of real-world applications, such as, properties prediction of molecules and community analysis in social networks. However, graph-level clustering is challenging due to the insufficient discriminability of graph-level representations, and the insufficient discriminability makes deep clustering be more likely to obtain degenerate solutions (cluster collapse). To address the issue, we propose a novel deep graph-level clustering method called Uniform Deep Graph Clustering (UDGC). UDGC assigns instances evenly to different clusters and then scatters those clusters on unit hypersphere, leading to a more uniform cluster-level distribution and a slighter cluster collapse. Specifically, we first propose Augmentation-Consensus Optimal Transport (ACOT) for generating uniformly distributed and reliable pseudo labels for partitioning clusters. Then we adopt contrastive learning to scatter those clusters. Besides, we propose Center Alignment Optimal Transport (CAOT) for guiding the model to learn better parameters, which further promotes the cluster performance. Our empirical study on eight well-known datasets demonstrates that UDGC significantly outperforms the state-of-the-art models.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the issue of graph-level clustering. Specifically, it aims to improve the performance of graph-level clustering and tackle the problem of "cluster collapse" encountered in deep graph clustering. #### Background and Challenges - **Limitations of Existing Methods**: Most current graph clustering methods focus on node-level clustering, i.e., grouping nodes within a single graph into different clusters. In contrast, there is less research on graph-level clustering (i.e., grouping multiple graphs). - **Application Scenarios**: Graph-level clustering plays an important role in various real-world applications, such as molecular property prediction and community analysis in social networks. - **Challenges**: - Graph-level representations are usually obtained through global pooling (e.g., mean pooling, sum pooling) from all node-level representations, which can lead to information loss. - Unlike node-level clustering, which can obtain additional supervision signals from neighbors, graph-level clustering lacks extra supervision signals, resulting in limited distinguishability of representations. - Deep graph clustering is prone to obtaining degenerate solutions, where all representations converge to a single point (representation collapse). #### Proposed Method To address the above challenges, the authors propose a novel deep graph-level clustering method called **Uniform Deep Graph Clustering (UDGC)**. The main contributions of UDGC include: 1. **Uniformly Distributed Pseudo-Label Generation Module**: Generates uniformly distributed pseudo-labels through Augmented Consensus Optimal Transport (ACOT), ensuring that each cluster has a sufficient number of samples. 2. **Representation Enhancement Module**: Utilizes contrastive learning to distribute different clusters uniformly on the unit hypersphere, avoiding cluster collapse. 3. **Center-Aligned Optimal Transport (CAOT)**: Further enhances the consistency between the representation space and the model parameter space, guiding the model to learn better parameters. #### Experimental Results - Experiments on eight commonly used datasets show that UDGC significantly outperforms existing state-of-the-art models. - Notably, UDGC performs well on both balanced and imbalanced datasets. Through these improvements, UDGC can significantly enhance the performance of deep graph-level clustering and effectively avoid the problem of cluster collapse.