Graph Convolutional Network For Semi-supervised Node Classification With Subgraph Sketching

Zibin Huang,Jun Xian
2024-04-25
Abstract:In this paper, we propose the Graph-Learning-Dual Graph Convolutional Neural Network called GLDGCN based on the classic Graph Convolutional Neural Network(GCN) by introducing dual convolutional layer and graph learning layer. We apply GLDGCN to the semi-supervised node classification task. Compared with the baseline methods, we achieve higher classification accuracy on three citation networks Citeseer, Cora and Pubmed, and we also analyze and discussabout selection of the hyperparameters and network depth. GLDGCN also perform well on the classic social network KarateClub and the new Wiki-CS dataset.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges faced by Graph Neural Networks (GNNs) when processing large - scale graph data, especially two key issues in semi - supervised node classification tasks: 1. **Improve the generalization ability and the ability to handle large - scale graphs of Graph Convolutional Networks (GCNs)**: - The paper points out that although GCNs perform well in many scenarios, their generalization ability and the ability to handle large - scale graphs still need to be improved. Specifically, traditional GCNs encounter problems of high computational complexity and large memory consumption when processing large - scale graphs, which limits their application scope. 2. **Introduce dual - convolution layers and graph - learning layers to enhance the feature extraction ability of GCNs**: - In order to improve these problems, the paper proposes a new graph convolutional neural network model - GLDGCN (Graph Learning Dual Graph Convolutional Neural Network). This model enhances the feature extraction ability of GCNs by introducing dual - convolution layers and graph - learning layers, and can handle data in general matrix form, expanding the application scope of GCNs. ### Specific Solutions 1. **Dual - Convolution Layers**: - GLDGCN, by introducing dual - convolution layers, not only utilizes the information of the adjacency matrix but also combines the information of the PPMI (Positive Pointwise Mutual Information) matrix, thereby extracting graph - structure features more comprehensively. The PPMI matrix can supplement the features of the adjacency matrix and enhance the representational ability of the model. 2. **Graph - Learning Layers**: - The graph - learning layer enables GCNs to accept general matrix data as input and generate a reasonable graph structure. This layer learns the graph structure by optimizing a loss function, thereby improving the adaptability and robustness of the model. 3. **Sub - graph Clustering and Stochastic Gradient Descent Techniques**: - In order to handle large - scale graph data, the paper introduces sub - graph clustering techniques and stochastic gradient descent techniques and designs a Cluster - based Graph Convolutional Neural Network (Cluster GCN). This technique reduces the computational complexity through mini - batch training, enabling GCNs to efficiently handle large - scale graph data. ### Experimental Results - **Performance on Benchmark Datasets**: - The paper conducts experiments on three classic citation network datasets (Citeseer, Cora, and Pubmed). The results show that GLDGCN achieves higher classification accuracy than the baseline methods on these datasets. - For example, on the Cora dataset, the classification accuracy rate of GLDGCN reaches 85.8%, which is significantly higher than other methods. - **The Influence of the Number of Training Samples**: - The paper also explores the influence of the number of training samples on classification accuracy. The experimental results show that when the number of training samples accounts for 2% or more of the total samples, GLDGCN shows good classification accuracy and model stability, demonstrating strong semi - supervised learning ability. - **The Influence of the Number of Network Layers**: - The paper further discusses the influence of the number of network layers on classification accuracy. The experimental results show that as the number of network layers increases, the classification accuracy will decrease to some extent, but it is still relatively stable overall. This suggests that increasing the number of network layers in GCNs may lead to a decline in learning performance, especially when processing large - scale graphs. ### Summary This paper effectively solves the computational complexity and memory consumption problems of traditional GCNs when processing large - scale graph data by introducing dual - convolution layers and graph - learning layers, as well as sub - graph clustering and stochastic gradient descent techniques, and improves the performance of semi - supervised node classification tasks. These innovations provide new ideas and methods for the expansion of graph neural networks in practical applications.