PISCES: Optimizing Multi-Job Application Execution in MapReduce

Qi Chen,Jinyu Yao,Benchao Li,Zhen Xiao
DOI: https://doi.org/10.1109/tcc.2016.2603509
IF: 5.697
2019-01-01
IEEE Transactions on Cloud Computing
Abstract:Nowadays, many MapReduce applications consist of groups of jobs with dependencies among each other, such as iterative machine learning applications and large database queries. Unfortunately, the MapReduce framework is not optimized for these multi-job applications. It does not explore the execution overlapping opportunities among jobs and can only schedule jobs independently. These issues significantly inflate the application execution time. This paper presents Pipeline Improvement Support with Critical chain Estimation Scheduling (PISCES), a critical chain optimization (a critical chain refers to a series of jobs which will make the application run longer if any one of them is delayed), to provide better support for multi-job applications. PISCES extends the existing MapReduce framework to allow scheduling for multiple jobs with dependencies by dynamically building up a job dependency DAG for current running jobs according to their input and output directories. Then using the dependency DAG, it provides an innovative mechanism to facilitate the data pipelining between the output phase (map phase in the Map-Only job or reduce phase in the Map-Reduce job) of an upstream job and the map phase of a downstream job. This offers a new execution overlapping between dependent jobs in MapReduce which effectively reduces the application runtime. Moreover, PISCES proposes a novel critical chain job scheduling model based on the accurate critical chain estimation. Experiments show that PISCES can increase the degree of system parallelism by up to 68 percent and improve the execution speed of applications by up to 52 percent.
What problem does this paper attempt to address?