An Incremental Algorithm For Frequent Itemset Mining On Spark

Min Yu,Chuang Zuo,Yunpeng Yuan,Yulu Yang
DOI: https://doi.org/10.1109/ICBDA.2017.8078823
2017-01-01
Abstract:Frequent Itemset Mining is one of the most investigated fields of data mining. It is expensive to mine frequent itemsets for a large scale data set. Especially when some data is added into the data set, it is still time-consuming from the scratch to re-compute the complete data set to update the frequent itemsets of the data set. Aiming to improve the performance of frequent itemset mining for large scale and dynamic data set, we propose a new incremental Apriori algorithm based on Spark. It reuses existing results from previous computation to modify the frequent itemsets according to the newly added data, which avoid massive recomputation. The newly proposed algorithm also takes full advantage of distributed resources with the support of Spark. Analysis in theory ensures that the newly proposed algorithm is correct. Experiments on the real-world data set demonstrate that the newly proposed algorithm effectively avoids reduplicated computation and improves the performance of frequent itemset mining with no additional storage overhead.
What problem does this paper attempt to address?