Optimization and reconstruction shuffle in MapReduce

Peng Fuquan,Jin Canghong,Wu Minghui,Ying Jing
DOI: https://doi.org/10.3969/j.issn.2095-2783.2012.04.001
2012-01-01
Abstract:We describe the MapReduce programming framework in detail,and analyze the shuffle-stage process.Shuffle in MapReduce is optimized and reconstructed through the following three measures:compressing the output of the Map end,reconstructing the protocol used to copy the data form the Map end to the Reduce end,and optimizing memory allocation on the Reduce end.Finally,through building a Hadoop cluster,the experimental data are tested using the MapReduce distributed algorithm.Experimental results show that the MapReduce computing performance improves significantly after optimizing the reconstructed shuffle.
What problem does this paper attempt to address?