GC-Bench: An Open and Unified Benchmark for Graph Condensation

Qingyun Sun,Ziying Chen,Beining Yang,Cheng Ji,Xingcheng Fu,Sheng Zhou,Hao Peng,Jianxin Li,Philip S. Yu
2024-06-30
Abstract:Graph condensation (GC) has recently garnered considerable attention due to its ability to reduce large-scale graph datasets while preserving their essential properties. The core concept of GC is to create a smaller, more manageable graph that retains the characteristics of the original graph. Despite the proliferation of graph condensation methods developed in recent years, there is no comprehensive evaluation and in-depth analysis, which creates a great obstacle to understanding the progress in this field. To fill this gap, we develop a comprehensive Graph Condensation Benchmark (GC-Bench) to analyze the performance of graph condensation in different scenarios systematically. Specifically, GC-Bench systematically investigates the characteristics of graph condensation in terms of the following dimensions: effectiveness, transferability, and complexity. We comprehensively evaluate 12 state-of-the-art graph condensation algorithms in node-level and graph-level tasks and analyze their performance in 12 diverse graph datasets. Further, we have developed an easy-to-use library for training and evaluating different GC methods to facilitate reproducible research. The GC-Bench library is available at <a class="link-external link-https" href="https://github.com/RingBDStack/GC-Bench" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to systematically evaluate and analyze the effectiveness, transferability, and efficiency of Graph Condensation (GC) methods to fill the gaps in current research**. Specifically, the paper focuses on the following aspects: 1. **Effectiveness**: - Evaluate the performance of existing GC methods under different datasets and condensation ratios. - Research the influence of structural characteristics and initialization mechanisms on GC performance. 2. **Transferability**: - Explore whether the condensed graph can be transferred between different types of tasks (such as link prediction, node clustering, anomaly detection). - Analyze the transfer performance of GC methods on different backbone model architectures (such as SGC, GCN, GraphSAGE, APPNP, ChebyNet, Graph Transformer, etc.). 3. **Efficiency**: - Evaluate the time and space efficiency of GC methods, especially the computational and storage costs on large - scale graph data. ### Specific contributions of the paper To achieve the above goals, the paper introduces **GC - Bench**, an open and unified benchmarking platform for systematically evaluating existing graph condensation methods. The main contributions include: - **Comprehensive benchmarking**: GC - Bench integrates 12 representative and competitive GC methods, covering node - level and graph - level tasks, and analyzes them from multiple dimensions such as effectiveness, transferability, and efficiency. - **Key findings**: 1. Graph - level GC methods are far from the goal of lossless compression, and a larger condensation ratio does not necessarily lead to better performance. 2. GC methods can preserve the semantic information of the graph structure in the condensed graph, but there is still room for improvement in maintaining complex structural characteristics. 3. Condensed datasets perform poorly outside of specific tasks, resulting in limited applicability. 4. GC methods that rely on backbone models will embed model - specific information in the condensed dataset, and popular graph transformers are incompatible with current GC methods. 5. The initialization mechanism affects performance and convergence speed according to the characteristics of the dataset and GC method. 6. Most GC methods that combine backbone models and full - dataset training perform poorly in terms of time and space efficiency, which goes against the original intention of using GC for efficient training. - **Open - source benchmark library and future directions**: GC - Bench is open - source and easy to extend to new methods and datasets, which helps to identify directions for further exploration and promote future research. Through these efforts, the paper aims to provide valuable insights for research in the field of graph condensation and point out directions for future improvement and development.