Mop: an Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing

Zhi-Hong Deng,Ning Gao,Xiao-Ran Xu
DOI: https://doi.org/10.3233/fi-2011-568
2011-01-01
Fundamenta Informaticae
Abstract:Mining frequent patterns in database has emerged as an important task in knowledge discovery and data mining. In this paper, we present an efficient algorithm called Mop for fast frequent pattern discovery. Mop utilizes a new kind of data structure called OP_tree (ordered pattern tree) and some particular properties of frequent patterns to facilitate the process of mining frequent patterns. An OP tree is a special frequent pattern tree, where the children of any node are sorted according to the supports of corresponding items. Efficiency of Mop is achieved with three techniques: (1) it adopts OP tree to store a large database to avoid repetitive database scans, (2) it finds all frequent 2-patterns in the construction of OP tree to avoid the costly generation of a large number of candidate 2-patterns, (3) the supports of candidate k-patterns (k>2) can be obtained by traversing a few of specific subtrees of the OP tree, which greatly reduces the search space and avoid multi-scans of a database. We experimentally compare our algorithm with the Apriori algorithm and the FP-growth algorithm on one real database and one synthetical database. The experimental results show that Mop is about an order of magnitude faster than the Apriori algorithm. Mop also outperforms the FP-growth algorithm, especially when support threshold is very low and databases are quite large.
What problem does this paper attempt to address?