Abstract:Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as \({\text{PFIMD}}\) algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called \({\text{DiffNodeset}}\) is adopted for avoiding the increase of \(N{-}list\) cardinality in the \({\text{MRPrePost}}\) algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the \({\text{DiffNodeset}}\) generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in \(F{-}list\), a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of \({\text{MRPrePost}}\) in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of \({\text{PFIMD}}\) algorithm in several multimedia data sets are listed to illustrate its universality.

PNPFI: An Efficient Parallel Frequent Itemsets Mining Algorithm

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

A STABLE PARALLEL DISTRIBUTED FREQUENT ITEMSET MINING ALGORITHM AND ITS APPLICATION

YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark

Gc-Tree: A Fast Online Algorithm For Mining Frequent Closed Itemsets

A New Algorithm for Fast Mining Frequent Itemsets Using N-lists

Pfp: Parallel Fp-Growth For Query Recommendation

A Novel Method to Generate Frequent Itemsets in Distributed Environment

TFP: an Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

PrePost+

Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets

GMiner++: Boosting GPU-based frequent itemset mining by reducing redundant computations

MapReduce-based Closed Frequent Itemset Mining with Efficient Redundancy Filtering

Efficient Probabilistic Frequent Itemset Mining In Big Sparse Uncertain Data

LDP-FPMiner: FP-Tree Based Frequent Itemset Mining with Local Differential Privacy

MapReduce-Based Balanced Mining for Closed Frequent Itemset

Mining Maximum Length Frequent Itemsets: A Summary of Results

FRI-Miner: Fuzzy Rare Itemset Mining

Tree Partition Based Parallel Frequent Pattern Mining on Shared Memory Systems