IABS: parallel improved Apriori algorithm based on Spark

Mengjie Yan,Jun Luo,Jianying Liu,Chuanwang Hou
DOI: https://doi.org/10.3969/j.issn.1001-3695.2017.08.007
2017-01-01
Abstract:Apriori algorithm is one of the most classical algorithm in association rule mining, the core problem is the generation process of frequent itemsets.Firstly, aimed at the existing problems of classical Apriori algorithm, such as it needed to scan the transaction global database for several times and needed to generate candidate itemsets, this paper optimized it by transforming storage structure and eliminating the process of candidate itemsets generation.Then, with the advent of the era of big data, data volume rises with the day, classical Apriori algorithm faces severe challenge.Based on the improved Apriori algorithm and combined with Spark platform, this paper proposed the IABS algorithm, which made full use of Spark, such as in-memory computation, resilient distributed datasets.Compared with already existing similar algorithms, the sizeup and node salability of IABS are validated, as well as, IABS achieves 23.88% performance improvement in average for various benchmarks.Especially, as the growth of data, its performance improvement is more obvious.
What problem does this paper attempt to address?