Abstract:Graph classification, which aims to identify the category labels of graphs, plays a significant role in drug classification, toxicity detection, protein analysis etc. However, the limitation of scale in the benchmark datasets makes it easy for graph classification models to fall into over-fitting and undergeneralization. To improve this, we introduce data augmentation on graphs (i.e. graph augmentation) and present four methods:random mapping, vertex-similarity mapping, motif-random mapping and motif-similarity mapping, to generate more weakly labeled data for small-scale benchmark datasets via heuristic transformation of graph structures. Furthermore, we propose a generic model evolution framework, named M-Evolve, which combines graph augmentation, data filtration and model retraining to optimize pre-trained graph classifiers. Experiments on six benchmark datasets demonstrate that the proposed framework helps existing graph classification models alleviate over-fitting and undergeneralization in the training on small-scale benchmark datasets, which successfully yields an average improvement of 3 - 13% accuracy on graph classification tasks.

What problem does this paper attempt to address?

This paper attempts to solve the problems of over - fitting and insufficient generalization in graph classification tasks. Specifically, due to the limited size of existing benchmark datasets, graph classification models are prone to over - fitting and insufficient generalization. To solve this problem, the author introduced graph augmentation techniques and proposed four graph augmentation methods: random mapping, vertex - similarity mapping, motif - random mapping, and motif - similarity mapping. These methods heuristically modify and transform the graph structure to generate more weakly - labeled data, thereby expanding the size of small - scale datasets. In addition, the author also proposed a general model evolution framework M - Evolve, which combines graph augmentation, data filtering, and model retraining to optimize pre - trained graph classifiers. Experimental results show that the M - Evolve framework can significantly improve the performance of existing graph classification models on small - scale benchmark datasets, with an average increase in classification accuracy of 3% to 13%. ### Summary of the core issues in the paper: 1. **Over - fitting and insufficient generalization**: Due to the small size of existing graph classification datasets, models are prone to over - fitting, resulting in insufficient generalization ability. 2. **Data augmentation**: Generate more weakly - labeled data through graph augmentation techniques to expand the size of the training dataset. 3. **Model optimization**: Propose the M - Evolve framework, which combines graph augmentation, data filtering, and model retraining to optimize graph classification models. ### Markdown representation of formulas: - **Vertex similarity calculation**: \[ s_{ij} = \sum_{z \in \Gamma(i) \cap \Gamma(j)} \frac{1}{d_z}, \quad S = \{s_{ij} | \forall (v_i, v_j) \in E_c^{\text{add}}\} \] \[ w_{ij}^{\text{add}} = \frac{s_{ij}}{\sum_{s \in S} s}, \quad W^{\text{add}} = \{w_{ij}^{\text{add}} | \forall (v_i, v_j) \in E_c^{\text{add}}\} \] - **Weight calculation for deleting edges**: \[ w_{ij}^{\text{del}} = 1 - \frac{s_{ij}}{\sum_{s \in S} s}, \quad W^{\text{del}} = \{w_{ij}^{\text{del}} | \forall (v_i, v_j) \in E_c^{\text{del}}\} \] - **Calculation of label reliability threshold**: \[ \theta = \arg \min_{\theta} \sum_{(G_i, y_i) \in D_{\text{val}}} \Phi[(\theta - r_i) \cdot g(G_i, y_i)] \] where, \[ g(G_i, y_i) = \begin{cases} 1 & \text{if } C(G_i) = y_i \\ -1 & \text{otherwise} \end{cases} \] \[ \Phi(x) = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{otherwise} \end{cases} \] Through these methods, the paper successfully solved the problems of over - fitting and insufficient generalization in graph classification tasks and significantly improved the classification performance.

M-Evolve: Structural-Mapping-Based Data Augmentation for Graph Classification

Subgraph Augmentation with Application to Graph Mining

Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation from Scratch

A Simple Data Augmentation for Graph Classification: A Perspective of Equivariance and Invariance

Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks

Learning to Augment Graph Structure for Both Homophily and Heterophily Graphs

Towards fidelity of graph data augmentation via equivariance

GABO: Graph Augmentations with Bi-level Optimization

Data Augmentation on Graphs: A Technical Survey

Multi-Aspect Heterogeneous Graph Augmentation

Bootstrapping Informative Graph Augmentation Via A Meta Learning Approach

Null Model-Based Data Augmentation for Graph Classification

Data Augmentation for Deep Graph Learning: A Survey

Graph Data Augmentation for Node Classification

NodeAug: Semi-Supervised Node Classification with Data Augmentation

Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models

Robust Optimization as Data Augmentation for Large-scale Graphs

An Evolution Kernel Method for Graph Classification Through Heat Diffusion Dynamics

Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift

Label-invariant Augmentation for Semi-Supervised Graph Classification

EvolveKG: a general framework to learn evolving knowledge graphs