Learning-based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching

Yunsheng Bai,Hao Ding,Yizhou Sun,Wei Wang
2021-05-17
Abstract:Graph similarity computation is one of the core operations in many graph-based applications, such as graph similarity search, graph database analysis, graph clustering, etc. Since computing the exact distance/similarity between two graphs is typically NP-hard, a series of approximate methods have been proposed with a trade-off between accuracy and speed. Recently, several data-driven approaches based on neural networks have been proposed, most of which model the graph-graph similarity as the inner product of their graph-level representations, with different techniques proposed for generating one embedding per graph. However, using one fixed-dimensional embedding per graph may fail to fully capture graphs in varying sizes and link structures, a limitation that is especially problematic for the task of graph similarity computation, where the goal is to find the fine-grained difference between two graphs. In this paper, we address the problem of graph similarity computation from another perspective, by directly matching two sets of node embeddings without the need to use fixed-dimensional vectors to represent whole graphs for their similarity computation. The model, GraphSim, achieves the state-of-the-art performance on four real-world graph datasets under six out of eight settings (here we count a specific dataset and metric combination as one setting), compared to existing popular methods for approximate Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) computation.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to calculate the similarity or distance between two graphs efficiently and accurately in graph similarity calculation. Specifically, the paper points out that although traditional methods based on Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) are well - defined, they have high computational complexity and are difficult to be used on a large scale in practical applications. In recent years, although data - driven methods based on neural networks have provided an approximate solution, these methods usually rely on generating fixed - dimensional embedding vectors for each graph to represent the entire graph, which may lead to the inability to fully capture graphs of different sizes and link structures, especially in tasks that require fine - grained comparison of the differences between two graphs. To solve these problems, this paper proposes a new method - GRAPH SIM. This method performs similarity calculation by directly matching two sets of node embeddings without using fixed - dimensional vectors to represent the entire graph. This method can better handle graphs of different sizes and structures and has achieved state - of - the - art performance in six settings of four real - world graph datasets.