Graph data augmentation with Gromow-Wasserstein Barycenters

Andrea Ponti
2024-04-12
Abstract:Graphs are ubiquitous in various fields, and deep learning methods have been successful applied in graph classification tasks. However, building large and diverse graph datasets for training can be expensive. While augmentation techniques exist for structured data like images or numerical data, the augmentation of graph data remains challenging. This is primarily due to the complex and non-Euclidean nature of graph data. In this paper, it has been proposed a novel augmentation strategy for graphs that operates in a non-Euclidean space. This approach leverages graphon estimation, which models the generative mechanism of networks sequences. Computational results demonstrate the effectiveness of the proposed augmentation framework in improving the performance of graph classification models. Additionally, using a non-Euclidean distance, specifically the Gromow-Wasserstein distance, results in better approximations of the graphon. This framework also provides a means to validate different graphon estimation approaches, particularly in real-world scenarios where the true graphon is unknown.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to effectively augment graph data in non - Euclidean spaces to improve the performance of graph classification tasks. Specifically, constructing large - scale and diverse graph datasets for training is very expensive, and the existing graph data augmentation techniques are still challenging, mainly due to the complexity and non - Euclidean nature of graph data. ### Main problems 1. **High cost of constructing large - scale and diverse graph datasets**: - In practical applications, constructing graph datasets for training neural methods is very expensive. 2. **Limitations of existing graph data augmentation techniques**: - Existing graph data augmentation strategies usually operate only within a single graph (such as modifying edges or nodes) and cannot achieve information exchange between different instances. - Traditional augmentation techniques (such as those in image, video or text data) are difficult to be directly applied to graph data because graph data has a complex non - Euclidean structure. ### Solutions The paper proposes a new graph data augmentation strategy based on graphon estimation and Gromov - Wasserstein barycenter. This method uses graphon to model the generation mechanism of network sequences and proves its effectiveness in improving the performance of graph classification models through calculation results. ### Key points - **Graphon**: Graphon is the limit object of large - graph sequences and can be used to generate graphs of arbitrary size. New graphs can be created by sampling nodes from a uniform distribution and generating an adjacency matrix according to graphon. - **Gromov - Wasserstein distance**: This is a non - Euclidean distance metric, especially suitable for graph data. Using this distance can better approximate graphon and is helpful for verifying different graphon estimation methods. ### Experimental results The experimental results show that using the augmented dataset in graph classification tasks can significantly improve the model performance, especially in multi - class problems or when the multi - class distinction is not obvious. In addition, graphon estimated using Gromov - Wasserstein barycenter usually brings better performance improvement. ### Summary The main contribution of this paper is to propose a graph data augmentation framework based on graphon and Gromov - Wasserstein barycenter, which solves the limitations of existing augmentation techniques on graph data and demonstrates its effectiveness in graph classification tasks.