Extend Tree Edit Distance for Effective Object Identification

Yue Wang,Hongzhi Wang,Liyan Zhang,Yang Wang,Jianzhong Li,Hong Gao
DOI: https://doi.org/10.1007/s10115-014-0816-1
IF: 2.7
2015-01-01
Knowledge and Information Systems
Abstract:Similarity join on XML documents which are usually modeled as rooted ordered labeled trees is widely applied, due to the ambiguity of references to the real-world objects. The conventional method dealing with this issue is based on tree edit distance, which is shortage of flexibility and efficiency. In this paper, we propose two novel edit operations together with extended tree edit distance, which can achieve good performance in similarity matching with hierarchical data structures [the run-time is \(O(n^{3})\) in the worst case]. And then, we propose \(k\)-generation set distance as a good approximation of the tree edit distance to further improve the join efficiency with quadric time complexity. Experiments on real and synthetic databases demonstrate the benefit of our method in efficiency and scalability.
What problem does this paper attempt to address?