GCN-based weakly-supervised community detection with updated structure centres selection

Liping Deng,Bing Guo,Wen Zheng
DOI: https://doi.org/10.1080/09540091.2023.2291995
2024-01-04
Connection Science
Abstract:Community detection is a classic problem in network learning. Semi-supervised network learning requires a certain amount of known samples, while sample annotation is time-consuming and laborious. In particular, when the number of known samples is only very small, the learning ability of existing semi-supervised network learning models decreases sharply. In view of this, a weakly-supervised community detection method based on graph convolutional neural network (WC-GCN). Firstly, it introduces a genetic evolution strategy to select and update the structure centres, which enables the updating structure centre process to not get stuck in the local optima, and get the structural centres that are closer to the global best, solving the problem of centre dependence. Secondly, the structural centrality index Cstruct is proposed to measure the representativeness of a subgraph, learning more accurate network structure centres. Thirdly, a self-training method to expand the pseudo-labelled nodes for GCN training to further improve the model effect. The proposed method is evaluated on various real-world networks and shows that it outperforms the state-of-the-art community detection algorithms.
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the problem of community detection in complex networks, particularly how to effectively perform community detection when label data is very limited. Specifically, the paper focuses on the following aspects: 1. **Limited Label Data**: - Semi-supervised network learning methods usually require a certain number of known samples (i.e., labeled nodes), but obtaining these label data is both time-consuming and labor-intensive. - When the number of known labeled nodes is very small, the learning ability of existing semi-supervised network learning models significantly decreases. 2. **Structural Center Dependency**: - Existing community detection methods heavily rely on selected structural centers. Traditional structure-based center selection methods highly depend on the initial structural center nodes during the expansion process and only consider network topology information, leading to poor community discovery results. 3. **Improving Model Performance**: - How to improve the performance of Graph Convolutional Networks (GCNs) for community detection by improving the selection and updating methods of structural centers and using self-training methods to expand pseudo-labeled nodes when label data is limited. ### Solutions To overcome the above problems, the paper proposes a weakly supervised community detection method based on Graph Convolutional Networks (GCNs) called WC-GCN. The main contributions include: 1. **Weakly Supervised Community Detection Method**: - Proposes a weakly supervised community detection method based on GCNs (WC-GCN), which can learn important nodes (structural centers) of the network using a very small number of known labeled samples and further explore network communities after expanding the training set. 2. **Genetic Evolution Strategy for Selecting Structural Centers**: - Uses a genetic evolution strategy to select and update structural centers, ensuring that the updating process does not fall into local optima, thereby obtaining structural centers closer to the global optimum. This solves the center dependency problem. Particularly, when the number of known labeled samples is very small and there are mislabeled samples, the genetic evolution strategy has high fault tolerance. 3. **Structural Center Metric**: - For the first time, proposes a structural center metric \( C_{\text{struct}} \), which can well reflect the ability of a subgraph to represent the entire network, learning more accurate network structural centers. 4. **Self-Training Method**: - Uses a self-training method to expand some high-confidence neighboring nodes, establishing a balanced training set and further improving model performance. ### Conclusion The method has been evaluated on various real-world networks, and the results show that its performance is superior to existing state-of-the-art community detection algorithms. Through these improvements, the paper effectively addresses the issues of limited label data and structural center dependency, enhancing the accuracy and robustness of community detection.