Grams(3): An Efficient Framework For Xml Structural Similarity Search

Peisen Yuan,Xiaoling Wang,Chaofeng Sha,Ming Gao,Aoying Zhou
DOI: https://doi.org/10.1007/978-3-642-14589-6_43
2010-01-01
Abstract:Structural similarity search is a fundamental technology for XML data management. However, existing methods do not scale well with large volume of XML document. The pq-gram is an efficient way of extracting substructure from the tree-structured data for approximate structural similarity search. In this paper, we propose an effective framework GRAMS(3) for evaluating structural similarity of XML data. First pq-grams of XML document are extracted; then we study the characteristics of pq-gram of XML and generate doc-gram vector using TGF-IGF model for XML tree; finally we employ locality sensitive hashing for efficiently structural similarity search of XML documents. An empirical study using both synthetic and real datasets demonstrates the framework is efficient.
What problem does this paper attempt to address?