CDCGAN: Class Distribution-aware Conditional GAN-based Minority Augmentation for Imbalanced Node Classification

Bojia Liu,Conghui Zheng,Fuhui Sun,Xiaoyan Wang,Li Pan
DOI: https://doi.org/10.1016/j.neunet.2024.106933
IF: 7.8
2024-01-01
Neural Networks
Abstract:Node classification is a fundamental task of Graph Neural Networks (GNNs). However, GNN models tend to suffer from the class imbalance problem which deteriorates the representation ability of minority classes, thus leading to unappealing classification performance. The most straightforward and effective solution is to augment the minority samples for balancing the representations of majority and minority classes. Previous methods leverage a limited number of labeled nodes to generate new samples, without considering the overall class characteristics and failing to reflect the underlying class distributions. Besides, they often yield less distinguishable nodes that can not represent their original classes well, because they may incorporate useless information from other classes to form node representations. To address this issue, we propose a Class Distribution-aware Conditional Generative Adversarial Network (CDCGAN) to generate diverse and distinguishable minority nodes based on their class distribution characteristics. Specifically, we extract the node embeddings and class distributions while preserving the topology and attribute information, thus capturing the overall class characteristics. Then, the obtained class distributions are used to design a conditional generator, which incorporates nonlinear transformations to generate diverse minority nodes and leverages adversarial learning to maintain intrinsic class distribution characteristics. At last, to ensure the distinguishability of node representations, a unique discriminator is implemented to jointly discriminate and classify nodes of the augmented graph. Extensive experiments conducted on six datasets demonstrate that the proposed CDCGAN outperforms the state-of-the-art methods on widely used evaluation metrics. The source code is available at https://github.com/Crystal-LiuBojia/CDCGAN.
What problem does this paper attempt to address?