DOM-Based Algorithm of Mining Frequent Patterns from XML Data

Genlin Ji,Suyun Wei,Peiming Bao
DOI: https://doi.org/10.3969/j.issn.1005-2615.2006.02.016
2006-01-01
Abstract:Data mining in XML data has a more complicated hierarchical data structure because of the semi-structured data feature, and it quite differs from the rational database-based mining. This paper presents an efficient mining algorithm FreqtTree for discovering all frequent patterns from XML data. Firstly, the algorithm transfers XML data into a DOM tree, and then adopts an incremental method to mine all frequent patterns from the DOM tree. The key of the algorithm FreqtTree is the notion of the rightmost expansion to increase a tree by attaching new nodes only on the rightmost branch. The number of candidate patterns is small because of utilizing the information of the frequent patterns discovered in the pervious iteration. In addition, the algorithm FreqtTree sufficiently uses the support of frequent (k-1) patterns to compute the support of candidate k pattern. Combining the above techniques, the algorithm traverses the DOM tree only once. Finally, a group of XML data is applied to test the performance of the algorithm and the experimental result is compared with other algorithms. Experimental results show that the algorithm is effective.
What problem does this paper attempt to address?