Subgraph generation applied in GraphSAGE deal with imbalanced node classification

Kai Huang,Chen Chen
DOI: https://doi.org/10.1007/s00500-024-09797-7
IF: 3.732
2024-07-14
Soft Computing
Abstract:In graph neural network applications, GraphSAGE applies inductive learning and has been widely applied in important research topics such as node classification. The subgraph of nodes directly affects the classification performance for GraphSAGE since it applies aggregation function to obtain embedding from the neighbors' feature. In many practical applications, the uneven class distribution of nodes makes it difficult for graph neural network to fully learn the topology and attribute of the minority, which limits the classification performance. Aiming at the problem of imbalanced node classification in GraphSAGE, we propose a new graph over-sampling algorithm called subgraph generation by conditional generative adversarial network (SG-CGAN). SG-CGAN learns the hidden layer expression of different nodes through GraphSAGE and trains conditional generative adversarial network (CGAN) through the nodes' hidden vector and related subgraph. Meanwhile, the hidden synthetic data are generated as input of CGAN to generate subgraphs of the minority, and retrain the GraphSAGE by adding the synthetic subgraphs. In the experiments on five graph datasets with first-order neighbors, the average improvement in ACC, macro-F1, and micro-F1 was , , and , respectively, compared to not adding synthetic data. In the second-order neighbor experiments, the percentages were , , and , verifying the effectiveness of the SG-CGAN generated data.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?