PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining
Mao Yimin,Geng Junhao,Deborah Simon Mwakapesa,Yaser Ahangari Nanehkaran,Zhang Chi,Deng Xiaoheng,Chen Zhigang
DOI: https://doi.org/10.1007/s00530-020-00725-x
IF: 3.9
2021-03-13
Multimedia Systems
Abstract:Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as <span class="mathjax-tex">\({\text{PFIMD}}\)</span> algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called <span class="mathjax-tex">\({\text{DiffNodeset}}\)</span> is adopted for avoiding the increase of <span class="mathjax-tex">\(N{-}list\)</span> cardinality in the <span class="mathjax-tex">\({\text{MRPrePost}}\)</span> algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the <span class="mathjax-tex">\({\text{DiffNodeset}}\)</span> generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in <span class="mathjax-tex">\(F{-}list\)</span>, a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of <span class="mathjax-tex">\({\text{MRPrePost}}\)</span> in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of <span class="mathjax-tex">\({\text{PFIMD}}\)</span> algorithm in several multimedia data sets are listed to illustrate its universality.
computer science, information systems, theory & methods