ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs
Jie Lou,Yiming Sun,Jie Zhang,Huawei Cao,Yuan Zhang,Ninghui Sun
DOI: https://doi.org/10.1007/s42514-023-00154-y
2023-05-24
CCF Transactions on High Performance Computing
Abstract:With the development of deep learning, hardware accelerators represented by GPUs have been used to accelerate the execution of deep learning applications. A key problem in GPU cluster is how to schedule various deep learning applications, including training applications and latency-critical inference applications, to achieve optimal system performance. In cloud datacenters, inference applications often require fewer resources, and the exclusive GPU execution of one inference application can result in a significant waste of GPU resources. Existing work mainly focuses on the co-location execution of multiple inference applications in datacenters using MPS (Multi-Process Service). There are several problems with this execution pattern, datacenters may be in low-workload state for long periods of time due to the diurnal pattern of inference applications, MPS-based data sharing can lead to interaction errors between contexts, and resource contention may cause Quality of Service (QoS) violations. To solve above problems, we propose ArkGPU, a runtime system that dynamically allocates resources. ArkGPU can improve the resource utilization of the cluster, while guaranteeing the QoS of inference applications. ArkGPU is comprised of a performance predictor, a scheduler, a resource limiter, and an adjustment unit. We conduct extensive experiments on the NVIDIA V100 GPU to verify the effectiveness of ArkGPU. We achieve High-Goodput for latency-critical applications which have an average throughput increase of 584.27% compared to MPS. We deploy multiple applications simultaneously on ArkGPU, and in this case, goodput is improved by 94.98% compared to k8s-native and 38.65% compared to MPS.