Efficiency Optimization Method for MapReduce Similarity Computing Based on Spark

Bin Liao,Tao Zhang,Jiong Yu,Binglei Guo,Yan Liu
DOI: https://doi.org/10.11896/j.issn.1002-137X.2017.08.009
2018-01-01
Abstract:With the exponential growth of both internet users and contents,the similarity computation of big data needs more efficiency.In order to improve the performance of the algorithm,the implementation of the algorithm was analyzed,as the characteristics of the Spark is suitable for the iterative and interactive tasks.The algorithm based on the 2D partition algorithm was transplanted from the MapReduce to the Spark.And through the parameter adjustment,memory optimization etc.we improved the efficiency of the algorithm.The experimental results with 2 data sets on 3 different sizes of clusters indicated that compared Spark with MapReduce,the algorithm implementation efficiency of Spark platform is 4.715 times higher than MapReduce,and energy consumption is only 24.86 % of the average energy consumption of Hadoop,which is about 4 times higher than Hadoop.
What problem does this paper attempt to address?