Abstract:Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as \({\text{PFIMD}}\) algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called \({\text{DiffNodeset}}\) is adopted for avoiding the increase of \(N{-}list\) cardinality in the \({\text{MRPrePost}}\) algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the \({\text{DiffNodeset}}\) generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in \(F{-}list\), a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of \({\text{MRPrePost}}\) in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of \({\text{PFIMD}}\) algorithm in several multimedia data sets are listed to illustrate its universality.

A Novel Parallel Algorithm for Frequent Itemsets Mining in Massive Small Files Datasets

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

HPFP-Miner: A Novel Parallel Frequent Itemset Mining Algorithm

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

Complex statistical analysis of big data: Implementation and application of Apriori and FP-Growth algorithm based on MapReduce

A STABLE PARALLEL DISTRIBUTED FREQUENT ITEMSET MINING ALGORITHM AND ITS APPLICATION

Balanced Parallel FP-Growth with MapReduce

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

A FP Growth Algorithm Based on MapReduce Model and It′s Application

A Distributed Frequent Itemset Mining Algorithm Using Spark for Big Data Analytics

PNPFI: An Efficient Parallel Frequent Itemsets Mining Algorithm

A Distributed Method for Fast Mining Frequent Patterns From Big Data

Parallel Graph Pattern Matching in Massive Networks Based on MapReduce

Efficiently Mining Frequent Itemsets on Massive Data

An Efficient Method for the Parallel Mining of Frequent Itemsets in Very Large Text Databases

A Distributed Frequent Itemset Mining Algorithm Based on Spark

A New Algorithm for Frequent Itemsets Mining Based on Apriori and FP-Tree

YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark

Fast Algorithm for Mining Global Frequent Itemsets Based on Distributed Database

Incremental FP-Growth Mining Strategy for Dynamic Threshold Value and Database Based on MapReduce.

A Spark-based Incremental Algorithm for Frequent Itemset Mining