Abstract:Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as \({\text{PFIMD}}\) algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called \({\text{DiffNodeset}}\) is adopted for avoiding the increase of \(N{-}list\) cardinality in the \({\text{MRPrePost}}\) algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the \({\text{DiffNodeset}}\) generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in \(F{-}list\), a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of \({\text{MRPrePost}}\) in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of \({\text{PFIMD}}\) algorithm in several multimedia data sets are listed to illustrate its universality.

Improved Algorithm for Parallel Mining Collaborative Frequent Itemsets in Multiple Data Streams

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

Approximate mining of global closed frequent itemsets over data streams

Gc-Tree: A Fast Online Algorithm For Mining Frequent Closed Itemsets

Mining Fuzzy Association Rules in Data Streams

A STABLE PARALLEL DISTRIBUTED FREQUENT ITEMSET MINING ALGORITHM AND ITS APPLICATION

Finding Frequent Closed Itemsets in Sliding Window in Linear Time.

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

Finding frequent items in data streams using hierarchical information.

DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

Mining Frequent Itemsets Over Arbitrary Time Intervals in Data Streams

Efficient Algorithm for Mining of Frequent Itemsets over Uncertain Data Streams

Mining Robust Frequent Items in Data Streams

State-of-the-art on Frequent Pattern Mining in Data Streams

Progressive online aggregation in a distributed stream system

Frequent Items Mining Based on Weight in Data Stream

Methods for Mining Frequent Items in Data Streams: an Overview

Efficient Discovery of Emerging Frequent Patterns in ArbitraryWindows on Data Streams

MapReduce-based Closed Frequent Itemset Mining with Efficient Redundancy Filtering

Survey of the Study on Frequent Pattern Mining in Data Streams