Fast Eclat Algorithms Based On Minwise Hashing For Large Scale Transactions
Chunkai Zhang,Panbo Tian,Xudong Zhang,Zoe L. Jiang,Lin Yao,Xuan Wang
DOI: https://doi.org/10.1109/JIOT.2018.2885851
IF: 10.6
2019-01-01
IEEE Internet of Things Journal
Abstract:The Eclat algorithm is one of the most widely used frequent itemset mining methods. In the normal Eclat algorithm and its variants, it is inefficient to calculate the intersection size of itemsets by sequentially comparing elements, especially for large scale transactions. In this paper, we propose the fast Eclat algorithms that can quickly calculate the intersection size of multiple itemsets by using minwise hashing and the estimators. Minwise hashing is used to calculate the Jaccard similarity coefficient by mapping the elements of the sets to those of smaller sets. Two estimators are used to estimate the intersection size of itemsets based on the Jaccard similarity coefficient. Due to the "imperfect" hash function, minwise hashing may obtain a biased Jaccard similarity, which results in error between the real value and the estimated value of the intersection size. Thus, we proposed the HashEclat which uses the maximum of vertical bar A vertical bar and vertical bar B vertical bar to represent the union size vertical bar A boolean OR B vertical bar, and proposed the Sim-Eclat which uses the minimum of vertical bar A vertical bar and vertical bar B vertical bar to represent the intersection size vertical bar A boolean AND B vertical bar. Furthermore, we use a boundary error E for better performance as follows: if E is large, the intersection size is determined by a traditional method, and the result is more accurate but takes longer to compute; otherwise, it will be the opposite. Both the theoretical analysis and experimental results show that the proposed algorithms can obtain almost all frequent itemsets with higher speed and less memory usage than other algorithms.