H-PFSP: Efficient Hybrid Parallel PFSP Protected Scheduling for MapReduce System

Yin Li,Chuang Lin,Fengyuan Ren,Yifeng Geng
DOI: https://doi.org/10.1109/TrustCom.2013.133
2013-01-01
Abstract:MapReduce provides a data-parallel computing framework, and has emerged as a popular processing model due to the simplicity of operations for big data application developers. Data processing applications from many different domains such as search and data mining are usually developed using open-source Hadoop implementation of MapReduce or self-developed MapReduce-like implementations like Dryad [1] and Ciel [2]. In cloud environments, products like Amazon's Elastic Compute Cloud (EC2) [3] provide MapReduce services as third-party multi-tenant service. Even within a company, a number of products may share the MapReduce cluster. Therefore, a fair and efficient scheduler is crucial to improve performance of submitted jobs and guarantee multi-user fairness. However, in practice, it is hard to guarantee both fairness and per-job performance, especially when jobs are scheduled without accurate estimation. We show that processor sharing (PS) type of schedulers like Fair Scheduling degrade the per-job performance in a multi-user environment. We present a new scheduling policy, Hybrid Parallel pessimistic Fair Schedule Protocol (H-PFSP), that can finish every job no later than Fair scheduler does. Unlike Fair scheduler, however, it can improve the per-job performance of MapReduce systems with relatively accurate job progress estimation.
What problem does this paper attempt to address?