Analysis and Improvement of Makespan and Utilization for MapReduce

Yin Li,Chuang Lin,Fengyuan Ren
DOI: https://doi.org/10.1109/HPCC.and.EUC.2013.69
2013-01-01
Abstract:A MapReduce cluster is usually shared by multiple users or products, aiming at accelerating their own job. In contrast, the utilization of the cluster is the main concern for the system itself. MapReduce jobs are split into independent tasks during execution. However, in practice, the number of tasks per stage for each job and the system settings are often sub-optimal to support certain workload. As far as we know, few research focused on the impacts of influential factors such as task granularity, slot count and workload, and the performance analysis from the perspectives of each job and system. We construct models to describe the effects of these factors on the performance from both the per-job and system perspectives. Based on the understanding provided by analytical model, we discuss the optimal settings of job and system parameters, then propose a batch scheduling policy instead of FIFO, so as to improve the system utilization and reduce the average job processing time. The simulation results show that the average gap time is reduced by 10% and the mean job make span is improved by 15% when each job is preempted 6 times on average using the batch scheduling policy.
What problem does this paper attempt to address?