MapReduce:a New Programming Model for Distributed Parallel Computing

LI Cheng-hua,ZHANG Xin-fang,JIN Hai,XIANG Wen
DOI: https://doi.org/10.3969/j.issn.1007-130x.2011.03.023
2011-01-01
Abstract:MapReduce is a programming model introduced by Google for writing applications that rapidly process vast amounts of data in parallel on large clusters of computing nodes.The model is inspired by map and reduce functions commonly used in functional programming.A Map/Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner.The reduce tasks merge all intermediate values generated by the map tasks.Users only devote themselves to how to specify the map functions and reduce functions.The details of partitioning the input data,scheduling the program's execution across a set of machines,handling machine failures,and managing the required inter-machine communication are taken care of by the run-time system of MapReduce.MapReduce will be widely adopted on the cloud computing platform.Several aspects of the Hadoop MapReduce contributed by Apache remain to be perfected.
What problem does this paper attempt to address?