Research of frequent pattern mining from XML data based on heterogeneous XML schema

Houqun Yang,Zhongshi He,Wenting Liang
2008-01-01
Journal of Computational Information Systems
Abstract:This paper researches frequent pattern mining algorithm based on heterogeneous XML schema. Proposed schemas similarity clustering in advance, and then the corresponding XML data are modeled as labeled ordered trees. This algorithm used rightmost path expansion method, which starts with pattern trees with only one node and the nodes are added only to the rightmost path gradually to generate new pattern trees. The number of candidate patterns is small because of utilizing the information of the frequent patterns discovered in the pervious iteration. To improve mining efficiency, this paper utilizes projected branch technique solving the problem with distinguishing isomorphism at the same time. Finally, a group of XML data is applied to test the performance of the algorithm and the experimental result is compared with other algorithms. Experimental results showed that the algorithm is efficient and feasible.
What problem does this paper attempt to address?