Data Mining Using Clouds : An Experimental Implementation of Apriori over MapReduce

Juan Li,Pallavi Roy,Samee U. Khan,Lizhe Wang,Yan Bai
2012-01-01
Abstract:Cloud computing has become a viable mainstream solution for data processing, storage and distribution. It promises on demand, scalable, pay-as-you-go compute and storage capacity. To analyze “big data” on clouds, it is very important to research data mining strategies based on cloud computing paradigm from both theoretical and practical views. For this purpose, we study a strategy of data mining on cloud using association rule mining as an example. In particular, we redesign and convert an existing sequential association rule algorithm, Apriori, to support MapReduce parallel computing platform. We implement and evaluate the proposed algorithm on Amazon EC2 MapReduce platform. The efficiency of our approach is manifested by the preliminary experimental results documented in this paper. Keywordscloud computing, data mingling, association rule
What problem does this paper attempt to address?