Abstract:Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as \({\text{PFIMD}}\) algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called \({\text{DiffNodeset}}\) is adopted for avoiding the increase of \(N{-}list\) cardinality in the \({\text{MRPrePost}}\) algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the \({\text{DiffNodeset}}\) generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in \(F{-}list\), a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of \({\text{MRPrePost}}\) in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of \({\text{PFIMD}}\) algorithm in several multimedia data sets are listed to illustrate its universality.

YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Distributed Affinity Propagation Clustering Based on MapReduce

Approximate mining of global closed frequent itemsets over data streams

A STABLE PARALLEL DISTRIBUTED FREQUENT ITEMSET MINING ALGORITHM AND ITS APPLICATION

ASCF: Optimization of the Apriori Algorithm Using Spark-Based Cuckoo Filter Structure

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

A Novel Method to Generate Frequent Itemsets in Distributed Environment

ANG: a Combination of Apriori and Graph Computing Techniques for Frequent Itemsets Mining

A Distributed Method for Fast Mining Frequent Patterns From Big Data

FRI-Miner: Fuzzy Rare Itemset Mining

Efficient Probabilistic Frequent Itemset Mining In Big Sparse Uncertain Data

Approximate Frequent Itemset Mining for Streaming Data on FPGA

MapReduce-based Closed Frequent Itemset Mining with Efficient Redundancy Filtering

Research on Association Rules Mining Algorithm Based on Hadoop-Taking Apriori as an Example

Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

Performance study of distributed Apriori-like frequent itemsets mining

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

MapReduce-based Parallelized Approximation of Frequent Itemsets Mining in Uncertain Data.

Frequent Item-set Mining without Ubiquitous Items