Load-Aware Hybrid Scheduling in Large Compute Clusters

Di Fu,Jiahai Yang,Xiao Ling,Hui Zhang
DOI: https://doi.org/10.1109/iscc.2016.7543833
2016-01-01
Abstract:With the increasing of workloads in large scale heterogeneous compute clusters, distributed scheduling has won support from the academia and industry because of its inherent scalability and flexibility. However, existing schedulers cannot guarantee that all the jobs are acceptable and the average latency is extremely large. In particular, when the schedulers apply gang scheduling and non-preemptive policy, the serious job starvation problem will be triggered especially in heavily loaded clusters. In this paper, we introduce a novel hierarchical hybrid design of schedulers to address this problem, called En-Omega. In En-Omega, we enhance the fully distributed schedulers with a central scheduler, which can provide global fairness to the jobs from different schedulers and simultaneously reduce the average latency of all the jobs sharply. To reduce the overhead, in our En-Omega design, we activate the central scheduler only when the cluster is heavily loaded. Furthermore, the cache used for central queuing and the scoring policy used in central scheduling are all load-aware. We evaluate En-Omega based on Google trace and experimental results show that, compared to the baseline design, our method can reduce the average latency of starving jobs up to 90% with reasonable overhead.
What problem does this paper attempt to address?