Improving Performance of Graph Similarity Joins Using Selected Substructures

Xiang Zhao,Chuan Xiao,Wenjie Zhang,Xuemin Lin,Jiuyang Tang
DOI: https://doi.org/10.1007/978-3-319-05810-8_11
2014-01-01
Abstract:Similarity join of complex structures is an important operation in managing graph data. In this paper, we investigate the problem of graph similarity join with edit distance constraints. Existing algorithms extract substructures – either rooted trees or simple paths – as features, and transform the edit distance constraint into a weaker count filtering condition. However, the performance suffers from the heavy overlapping or low selectivity of substructures. To resolve the issue, we first present a general framework for substructure-based similarity join and a tighter count filtering condition. It is observed under the framework that using either too few or too many substructures can result in poor filtering performance. Thus, we devise an algorithm to select substructures for filtering. The proposed techniques are integrated into the framework, constituting a new algorithm, whose superiority is witnessed by experimental results.
What problem does this paper attempt to address?