HashEclat: an Efficient Frequent Itemset Algorithm

Chunkai Zhang,Panbo Tian,Xudong Zhang,Qing Liao,Zoe L. Jiang,Xuan Wang
DOI: https://doi.org/10.1007/s13042-018-00918-x
2019-01-01
International Journal of Machine Learning and Cybernetics
Abstract:The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than ‘exact’ result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
What problem does this paper attempt to address?