Hybrid Workload Scheduling on HPC Systems

Yuping Fan,Paul Rich,William Allcock,Michael Papka,Zhiling Lan
DOI: https://doi.org/10.48550/arXiv.2109.05412
2021-09-12
Abstract:Traditionally, on-demand, rigid, and malleable applications have been scheduled and executed on separate systems. The ever-growing workload demands and rapidly developing HPC infrastructure trigger the interest of converging these applications on a single HPC system. Although allocating the hybrid workloads within one system could potentially improve system efficiency, it is difficult to balance the tradeoff between the responsiveness of on-demand requests, the incentive for malleable jobs, and the performance of rigid applications. In this study, we present several scheduling mechanisms to address the issues involved in co-scheduling on-demand, rigid, and malleable jobs on a single HPC system. We extensively evaluate and compare their performance under various configurations and workloads. Our experimental results show that our proposed mechanisms are capable of serving on-demand workloads with minimal delay, offering incentives for declaring malleability, and improving system performance.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?