Reinforcement Graph Clustering with Unknown Cluster Number

Yue Liu,Ke Liang,Jun Xia,Xihong Yang,Sihang Zhou,Meng Liu,Xinwang Liu,Stan Z. Li
DOI: https://doi.org/10.48550/arXiv.2308.06827
2023-08-14
Abstract:Deep graph clustering, which aims to group nodes into disjoint clusters by neural networks in an unsupervised manner, has attracted great attention in recent years. Although the performance has been largely improved, the excellent performance of the existing methods heavily relies on an accurately predefined cluster number, which is not always available in the real-world scenario. To enable the deep graph clustering algorithms to work without the guidance of the predefined cluster number, we propose a new deep graph clustering method termed Reinforcement Graph Clustering (RGC). In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework by the reinforcement learning mechanism. Concretely, the discriminative node representations are first learned with the contrastive pretext task. Then, to capture the clustering state accurately with both local and global information in the graph, both node and cluster states are considered. Subsequently, at each state, the qualities of different cluster numbers are evaluated by the quality network, and the greedy action is executed to determine the cluster number. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters. Extensive experiments demonstrate the effectiveness and efficiency of our proposed method. The source code of RGC is shared at <a class="link-external link-https" href="https://github.com/yueliu1999/RGC" rel="external noopener nofollow">this https URL</a> and a collection (papers, codes and, datasets) of deep graph clustering is shared at <a class="link-external link-https" href="https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering" rel="external noopener nofollow">this https URL</a> on Github.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve a key problem in deep graph clustering: **effective graph clustering without pre - defined number of clusters**. Specifically, although existing deep graph clustering methods have achieved significant performance improvements, their excellent performance depends on the accurately pre - defined number of clusters \(K\), and in real - world scenarios, this parameter is not always known. To solve this problem, the authors propose a new deep graph clustering method - **Reinforcement Graph Clustering (RGC)**. By introducing the reinforcement learning mechanism, RGC unifies the determination of the number of clusters and unsupervised representation learning into one framework, thus achieving graph clustering without the need to pre - define the number of clusters. ### Description of key problems 1. **Limitations of existing methods**: - Existing deep graph clustering algorithms (such as DEC, DCN, etc.) require the pre - input of the number of clusters \(K\), which is often not feasible in practical applications. - Although the number - of - clusters estimation methods in traditional clustering (such as the ELBOW rule) can be used to determine the number of clusters, they will bring huge computational costs because it is necessary to repeatedly train the neural network to find the optimal number of clusters. 2. **Objectives of RGC**: - **Automatically determine the number of clusters**: Through the reinforcement learning mechanism, RGC can automatically determine the optimal number of clusters during the training process. - **Improve clustering performance**: By combining self - supervised learning and reinforcement learning, RGC can not only automatically determine the number of clusters, but also generate more discriminative node representations, thereby improving clustering performance. ### Overview of solutions The main innovations of RGC are: - **Reinforcement learning framework**: Model the determination of the number of clusters as a Markov decision process (MDP), and evaluate the quality of different numbers of clusters through the Quality Network, and select the optimal number of clusters. - **Self - supervised encoder**: Use the contrastive learning task to train the encoder to generate node representations with high discrimination. - **Reward function design**: Propose a clustering - oriented reward function to enhance the cohesion within the same cluster and the separation between different clusters. Through these designs, RGC can achieve efficient and accurate graph clustering without relying on the pre - defined number of clusters. ### Experimental verification The experimental results show that RGC performs excellently on multiple benchmark datasets, significantly outperforms traditional non - deep clustering methods, and can still be comparable to the state - of - the - art deep parametric clustering methods without the need to pre - define the number of clusters. In summary, by introducing the reinforcement learning mechanism, this paper successfully solves the problem of unknown number of clusters in deep graph clustering, and provides new research ideas and technical means for the field of graph clustering.