MapReduce Performance Optimizing through Replica Placement Strategy

Meng-yuan QIAN,Hui-qun YU
DOI: https://doi.org/10.3969/j.issn.1006-3080.2013.06.019
2013-01-01
Abstract:In heterogeneous environments, the nodes in a cluster have different performances due to their various hardware configurations. It is known that the Hadoop, the most widely used MapReduce implementation, does not sufficiently take heterogeneous environments into consideration. Moreover, in heterogeneous environments, many map tasks are not data-local such that the severe performance degradation. A novel replica placement strategy is proposed, which is based on the performance of nodes. The replica placement strategy also takes reliability, the overhead of replicas creation, and the performance balance between data blocks into account. Results show that the proportion of data-local map tasks is increased and the response time of MapReduce jobs is decreased effectively by using the proposed replica placement.
What problem does this paper attempt to address?