Approximating the Graph Edit Distance with Compact Neighborhood Representations

Franka Bause,Christian Permann,Nils M. Kriege
2023-12-07
Abstract:The graph edit distance is used for comparing graphs in various domains. Due to its high computational complexity it is primarily approximated. Widely-used heuristics search for an optimal assignment of vertices based on the distance between local substructures. While faster ones only consider vertices and their incident edges, leading to poor accuracy, other approaches require computationally intense exact distance computations between subgraphs. Our new method abstracts local substructures to neighborhood trees and compares them using efficient tree matching techniques. This results in a ground distance for mapping vertices that yields high quality approximations of the graph edit distance. By limiting the maximum tree height, our method supports steering between more accurate results and faster execution. We thoroughly analyze the running time of the tree matching method and propose several techniques to accelerate computation in practice. We use compressed tree representations, recognize redundancies by tree canonization and exploit them via caching. Experimentally we show that our method provides a significantly improved trade-off between running time and approximation quality compared to existing state-of-the-art approaches.
Data Structures and Algorithms
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the high computational complexity of Graph Edit Distance (GED) calculation. GED is an important indicator for measuring the similarity between two graphs and is widely used in various fields, such as web content mining, pattern recognition, molecular property prediction, etc. However, since the calculation of GED is an NP - hard problem, the exact calculation of GED is very time - consuming and infeasible in practical applications, so approximate methods are usually required. ### Core problems of the paper 1. **Computational complexity**: The time complexity of exactly calculating GED is very high. Especially for large graphs, the computational cost is almost unbearable. 2. **Limitations of existing methods**: - Faster methods only consider vertices and their directly connected edges, resulting in lower accuracy. - More accurate methods require the exact calculation of GED between sub - graphs, which brings huge time and space costs. ### Solutions proposed in the paper To solve the above problems, this paper proposes a new method for approximating GED, which is achieved through the following steps: 1. **Introducing Neighborhood Trees**: Abstract the local structure of each vertex into a neighborhood tree. These trees can capture the complex structural information around the vertex. 2. **Using Structure and Depth Preserving Tree Edit Distance (SDTED)**: Use SDTED to compare neighborhood trees, thereby obtaining the cost matrix of vertex mapping. 3. **Optimizing computational efficiency**: - Control the trade - off between accuracy and speed by limiting the maximum height of the neighborhood tree. - Use compression representation and caching techniques to accelerate the actual calculation. ### Main contributions - Propose a new GED approximation method based on neighborhood trees and SDTED. - Design a compact neighborhood tree representation form to reduce redundant information. - Analyze and improve the running time of SDTED, making it more efficient than previous methods. - Experimental results show that the new method is superior to the existing state - of - the - art algorithms in approximation quality while maintaining efficient computational performance. ### Formula presentation The definition of GED is as follows: \[ \text{GED}(G, H) = \min_{(e_1, e_2, \ldots, e_k) \in \Upsilon(G, H)} \sum_{i = 1}^k c(e_i) \] where \(\Upsilon(G, H)\) represents the set of all possible edit paths from graph \(G\) to graph \(H\), and \(c(e_i)\) is the cost of the \(i\)-th edit operation. The definition of SDTED is as follows: \[ \text{SDTED}(T, T') := \min \{ c'(M) \mid M \in \text{SDM}(T, T') \} \] where \(\text{SDM}(T, T')\) represents the set of all structure - and depth - preserving mappings, and \(c'(M)\) is the cost of mapping \(M\), defined as: \[ c'(M) = \sum_{(u, v) \in M} c(\mu(u), \mu(v)) + \sum_{u \in M_1} c(u, \epsilon) + \sum_{v \in M_2} c(\epsilon, v) \] Through this method, the paper provides an effective GED approximation scheme that can significantly improve computational efficiency while ensuring accuracy.