Interference-aware execution framework with Co-scheML on GPU clusters

Sejin Kim,Yoonhee Kim
DOI: https://doi.org/10.1007/s10586-021-03299-z
2021-05-18
Cluster Computing
Abstract:Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers.
What problem does this paper attempt to address?