Research of XML document similarity based on edit graph

Peijuan XU,Fuhui QI,Zhuo LI,Limin WANG
DOI: https://doi.org/10.3778/j.issn.1002-8331.1401-0252
2016-01-01
Abstract:There are many algorithms for comparing XML similarity so far, and ED-based method is one of the most impor-tant classes. Because of the high efficiency feature, the edit graph algorithm becomes the basis of many ED algorithms. Firstly, the article introduces the idea of edit graph, because it has a strong dependence on the order of sibling nodes which is in the same layer in the sorting process, so the edit graph algorithm is not accurate and effective to compare the data-center XML document similarity. To resolve the problem, splitting edit graph algorithm based on edit graph and path algorithm is presented. Experimental results show that the algorithm reduces the dependence on the sibling order of the same layer of the edit graph algorithm, and it is more suitable for the data-center XML document similarity comparision, and the result of split edit graph algorithm is more accurate and effective.
What problem does this paper attempt to address?