Complex statistical analysis of big data: Implementation and application of Apriori and FP-Growth algorithm based on MapReduce

Zhuobo Rong,Dawen Xia,Zili Zhang
DOI: https://doi.org/10.1109/ICSESS.2013.6615467
2013-01-01
Abstract:In the single machine environment, the problems of Apriori and FP-Growth algorithm in large-scale data association rules mining are high memory consumption, low computing performance, poor scalability and reliability and so on. Therefore, we put forward a new implementation method which is based on MapReduce parallel environment for mining frequent itemsets to generate association rules and is verified by using different sizes of real datasets with different nodes in the cluster, meanwhile, selecting “speedup, scalability and reliability” as an indicator. The results show that our method is feasible and valid and is able to improve the overall performance and efficiency of Apriori and FP-Growth algorithm to meet the needs of large-scale data association rules mining.
What problem does this paper attempt to address?