Load Balancing In Heterogeneous Mapreduce Environments

Yuanquan Fan,Weiguo Wu,Depei Qian,Yunlong Xu,Wei Wei
DOI: https://doi.org/10.1109/HPCC.and.EUC.2013.209
2013-01-01
Abstract:MapReduce has emerged as a popular computing model for parallel processing of big data. However, we observe that the native hash partitioning of MapReduce systems leads to frequent uneven data distribution among reduce tasks. The uneven data distribution results in load imbalance among reduce tasks, and thus hampers the performance of MapReduce systems. Moreover, the heterogeneity among cluster nodes exacerbates the negative effects of uneven data distribution due to varying performance of the heterogeneous nodes. To address the above issues, in this paper, we propose a novel load balancing approach with respect to the heterogeneity of clusters. This approach consists of two components: (1) performance estimation for reducers that run on heterogeneous nodes based on history of reduce tasks, and (2) heterogeneity-aware partitioning (HAP), which reallocates the input data for reduce tasks based on the performance estimation for reducers. We implement this approach as a plug-in of current MapReduce system. Experiment results show that our approach improves the performance of MapReduce jobs that run in heterogeneous systems, and incurs little overhead.
What problem does this paper attempt to address?