MapReduce-Based Balanced Mining for Closed Frequent Itemset

CHEN Guang-Peng,YANG Yu-Bin,GAO Yang,SHANG Lin
DOI: https://doi.org/10.1109/icws.2012.19
2012-01-01
Abstract:Mining closed frequent itemset (CFI) plays an essential role in many real-world data mining applications. With the emergence of abundant large-scale data sets, it now turns to be a significant and challenging issue to mine CFI concurrently. This paper proposes a parallel balanced mining algorithm for CFI based on the MapReduce platform. The proposed algorithm adopts Greedy strategy to group items aiming to balance the computation burdens among all parallel tasks, which is consisted of three main steps: (1) Parallel Counting, (2) Global Construction of Frequent List (F_list) and Group Map (G_map), (3) Parallel Mining for Closed Frequent Itemset. Experimental results validate the method and show its effectiveness as satisfied speedup and scalability are both achieved in large-scale CFI mining tasks.
What problem does this paper attempt to address?