Attributed Graph Clustering in Collaborative Settings

Rui Zhang,Xiaoyang Hou,Zhihua Tian,Jian Liu,Qingbiao Wu,Kui Ren
2024-11-19
Abstract:Graph clustering is an unsupervised machine learning method that partitions the nodes in a graph into different groups. Despite achieving significant progress in exploiting both attributed and structured data information, graph clustering methods often face practical challenges related to data isolation. Moreover, the absence of collaborative methods for graph clustering limits their effectiveness. In this paper, we propose a collaborative graph clustering framework for attributed graphs, supporting attributed graph clustering over vertically partitioned data with different participants holding distinct features of the same data. Our method leverages a novel technique that reduces the sample space, improving the efficiency of the attributed graph clustering method. Furthermore, we compare our method to its centralized counterpart under a proximity condition, demonstrating that the successful local results of each participant contribute to the overall success of the collaboration. We fully implement our approach and evaluate its utility and efficiency by conducting experiments on four public datasets. The results demonstrate that our method achieves comparable accuracy levels to centralized attributed graph clustering methods. Our collaborative graph clustering framework provides an efficient and effective solution for graph clustering challenges related to data isolation.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively perform attributed graph clustering when different participants hold different features of the same data in a collaborative environment. Specifically, the paper focuses on the graph clustering problem in the scenario of vertically partitioned data, that is, each participant has the feature information of different parts of the common graph structure. Due to the requirements of data isolation and privacy protection, these data cannot be directly shared. Therefore, the paper proposes a collaborative graph - clustering framework, aiming to improve the effect of graph clustering through collaborative learning while maintaining data privacy. ### Core Problems of the Paper 1. **Data Isolation Challenge**: Although existing graph - clustering methods have made significant progress in utilizing attributed and structured data, they often face the problem of data isolation in practical applications. The data held by different participants cannot be directly shared, which limits the effectiveness of traditional clustering methods. 2. **Lack of Collaborative Mechanism**: Currently, there is a lack of effective collaborative graph - clustering methods, especially in the case of vertically partitioned data, that is, each participant holds different features of the same data. ### Solutions The paper proposes a new collaborative graph - clustering framework, which is especially suitable for the scenario of vertically partitioned data of attributed graphs. The main contributions of this framework include: 1. **Communication - efficient Collaborative Learning Framework (kCAGC)**: This is the first unsupervised graph - clustering method for vertically partitioned data, which can improve the clustering effect through collaborative learning while maintaining data privacy. 2. **Reduction of Communication Costs**: By only communicating local clustering results, the communication complexity is reduced from \(O(n)\) to \(O(k^3)\), where \(k\) is the number of local clusterings and \(n\) is the size of the data set. 3. **Theoretical Guarantee**: It extends the classical proximity - based theoretical framework and proves that when the "restricted proximity condition" is met, the performance of this method is comparable to that of the centralized method. 4. **Experimental Verification**: Through experiments on four public data sets, the effectiveness and efficiency of kCAGC are verified. The results show that its performance is comparable to that of the centralized method and can even be comparable to the semi - supervised GraphSAGE model. ### Summary This paper mainly solves the problem of how to achieve efficient and accurate attributed graph clustering through collaborative learning under the requirements of data isolation and privacy protection. The proposed kCAGC framework has been theoretically proven and also shows good performance in practical applications.