Abstract:With the development of XML applications, such as Digital Library, XML subscribe/publish system, and other XML repositories, top-k structural similarity search over XML documents is attracting more attention. The similarity of two XML documents can be measured by using the edit distance defined between XML trees in previous work. Since the computation of edit distances is time consuming, some recent work presented some approaches to calculate edit distance by using structural summaries to improve the algorithm performance. However, most existing algorithms for calculating edit distance between trees ignore the fact that nodes in a tree may be of different significance, and the same edit operation costs are assumed inappropriately for all nodes in XML document tree. This paper addresses this problem by proposing a summary structure which could be used to make the tree-based edit distance more rational; furthermore, a novel weighting scheme is proposed to indicate that some nodes are more important than others with respect for structural similarity. We introduce a new cost model for computing structural distance and takes weight information into account for nodes in distance computation in this paper. Compared with former techniques, our approach can approximately answer the top-k queries efficiently. We verify this approach through a series of experiments, and the results show that using weighted structural summaries for top-k queries is efficient and practical.

Xml Structural Similarity Search Using Mapreduce

Handling distributed XML queries over large XML data based on MapReduce framework.

Distributed XPath Query Processing over Large XML Data Based on MapReduce Framework

Web mining of relations from XML and construct database schema

Grams(3): An Efficient Framework For Xml Structural Similarity Search

An Efficient XML Label Based on MapReduce

XML2HBase: Storing and querying large collections of XML documents using a NoSQL database system

TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce

Research on Evaluating Structural Similarity between XML Documents

Evaluate Structure Similarity In Xml Documents With Merge-Edit-Distance

Approximate top-k structural similarity search over XML documents

Xproj: A Framework For Projected Structural Clustering Of Xml Documents

A Model To Enhance Xml Document Clustering

Research On Data Analysis System Based On Xml (Id : 3-007)

Distributed data management using MapReduce

XML-based Complex Lab Data Representing, Storing and Publishing

An Efficient Schema Matching Approach Using Previous Mapping Result Set

Adaptive XML to Relational Mapping: an Integrated Approach

Switch-SSD Cache Based XML Query Processing in Hadoop

Mapping XML Data to Relational Data: A DOM-Based Approach

Efficient Keyword Search Over Data-Centric Xml Documents