Network-Adaptive Scheduling of Data-Intensive Parallel Jobs with Dependencies in Clusters

Shaoqi Wang,Xiaobo Zhou,Liqiang Zhang,Changjun Jiang
DOI: https://doi.org/10.1109/ICAC.2017.13
2017-01-01
Abstract:The performance of data-intensive parallel jobs is often constrained by the cluster's hard-to-scale network bisection bandwidth. Previous solutions do not consider inter-job data dependencies and schedule jobs independently from one another. In this work, we find that aggregating and co-locating the data and tasks of dependent jobs offers an extra opportunity of data locality that can help to greatly enhance the performance of clusters and jobs. We propose and design Dawn, a network-adaptive scheduler that includes an online plan and an adaptive task scheduler for jobs with dependencies. The online plan, taking job dependencies into consideration, determines preferred locations (e.g., racks) for tasks to proactively aggregate dependent data. The task scheduler, based on the output of online plan and dynamic network bandwidth status, adaptively schedule tasks to co-locate with the dependent data in order to take advantage of data locality. We implement Dawn on Apache Yarn and evaluate it on clusters using various benchmark workloads. Results show that Dawn effectively improves system throughput by 42-51% and 21-28% compared to Fair Scheduler and ShuffleWatcher, respectively.
What problem does this paper attempt to address?