PNPFI: An Efficient Parallel Frequent Itemsets Mining Algorithm

Fang Zhang,Yu Zhang,Xiaofei Liao,Hai Jin
DOI: https://doi.org/10.1109/CSCWD.2018.8465270
2018-01-01
Abstract:Frequent itemsets mining (FIM) plays an important role in many data mining areas. With the explosion of data scale, a number of parallel FIM algorithms have been proposed. Although existing solutions have outstanding scalability, they suffer from high consumption of CPU and memory for recursively mining frequent itemsets based on a tree-structure. In this paper, we propose a novel parallel algorithm, named PNPFI. It employs three novel key optimizations. In detail, the itemsets are stored by the N-list structure, which is more compact than existing tree-based structure. It uses a new structure, called P-Subsume, to generate some frequent itemsets without the process of N-list intersection. In addition, PNPFI proposes a new load balancing strategy, which intelligently divides a large-scale FIM problem into a set of tasks based on the profiled load of each item. Compared with the state-of-the-art algorithms, experimental results show that PNPFI gets a performance improvement of 39% on average (max to 79%), and reduces the memory usage by 58% on average (max to 90%).
What problem does this paper attempt to address?