Heterogeneous Graph Condensation

Jian Gao,Jianshe Wu,Jingyi Ding
DOI: https://doi.org/10.1109/tkde.2024.3362863
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Graph neural networks greatly facilitate data processing in homogeneous and heterogeneous graphs. However, training GNNs on large-scale graphs poses a significant challenge to computing resources. It is especially prominent on heterogeneous graphs, which contain multiple types of nodes and edges, and heterogeneous GNNs are also several times more complex than the ordinary GNNs. Recently, Graph condensation (GCond) is proposed to address the challenge by condensing large-scale homogeneous graphs into small-scale informative graphs. Its label-based feature initialization and fully-connected design perform well on homogeneous graphs. While in heterogeneous graphs, label information generally only exists in specific types of nodes, making it difficult to be applied directly to heterogeneous graphs. In this paper, we propose heterogeneous graph condensation (HGCond). HGCond uses clustering information instead of label information for feature initialization, and constructs a sparse connection scheme accordingly. In addition, we found that the simple parameter exploration strategy in GCond leads to insufficient optimization on heterogeneous graphs. This paper proposes an exploration strategy based on orthogonal parameter sequences to address the problem. We experimentally demonstrate that the novel feature initialization and parameter exploration strategy is effective. Experiments show that HGCond significantly outperforms baselines on multiple datasets. On the dataset DBLP, HGCond can condense DBLP to 0.5% of its original scale to obtain DBLP-0.005. GNNs trained on DBLP-0.005 can retain nearly 99% accuracy compared to the GNNs trained on full-scale DBLP.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?