Efficient Search in Graph Edit Distance: Metric Search Trees vs. Brute Force Verification

Wenqi Marshall Guo,Jeffrey Uhlmann
2024-03-15
Abstract:This report evaluates the efficiency of Graph Edit Distance (GED) computation for graph similarity search, comparing Cascading Metric Trees (CMT) with brute-force verification. Despite the anticipated advantages of CMT, our findings indicate it does not consistently outperform brute-force methods in speed. The study, based on graph data from PubChem, suggests that the computational complexity of GED-based GSS remains a challenge.
Databases,Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to improve the efficiency of Graph Similarity Search (GSS) based on Graph Edit Distance (GED)**. ### Problem Background Graph similarity search plays a crucial role in multiple fields, especially in molecular and protein similarity searches. GED is a commonly used metric for measuring the similarity between two graphs, which is based on the minimum number of steps required to edit one graph into another. However, calculating the exact GED usually requires exponential time, which significantly slows down the search process. ### Existing Methods 1. **Brute Force Verification**: - Use a lower bound to filter graphs in the database, and only graphs with a lower bound less than the threshold will be retained. - Then, instead of calculating the exact GED, verify whether the GED of two graphs is below a given threshold. 2. **Methods based on metric search trees (such as Cascading Metric Tree, CMT)**: - CMT has been proven to enhance similarity searches using different metrics (such as Euclidean distance and Kendall - Tau distance). - This paper investigates whether CMT can be optimized by Upper - and - Lower Bounds (UBLB) to improve the efficiency of GSS based on GED. ### Research Objectives The main objective of this paper is to explore whether CMT performs better than the simple brute - force verification method in GED calculation. Specifically, the author hopes to optimize CMT by introducing UBLB and verify its performance improvement in graph similarity search. ### Main Findings The experimental results show that although CMT performs well on other metrics, it does not significantly outperform the brute - force verification method in GED calculation. In fact, in many cases, the speed of CMT is significantly slower than that of brute - force verification. Possible reasons include: - Even with UBLB, the calculation of GED is still very expensive. - For a smaller query radius (a common situation in practical applications), brute - force verification can be completed very quickly. These findings emphasize the importance of continuing to explore methods for accelerating GED calculation and verification, especially optimizing the calculation of GED upper and lower bounds, and evaluating the performance of these methods on a wider range of datasets and search spaces. ### Summary The research in this paper reveals the limitations of CMT in GED calculation and provides directions for future research. Although CMT may perform better in some cases, the current results indicate that brute - force verification is still an efficient option. Future work should focus on optimizing the implementation of CMT, expanding datasets, and exploring more effective GED calculation methods.