Optimizing the Aggregated Throughput of GPUs in Public Clouds Based on Adaptive Kernel Reordering.

Jingjin Du,Quan Chen,Minyi Quo
DOI: https://doi.org/10.1109/icpads47876.2019.00058
2019-01-01
Abstract:GPGPUs have been widely used in public Clouds for the high computational ability, while a single GPU is shared by multiple tenants. When multiple applications share a single GPU, their kernels time-shares the GPU and the frequent context switches result in heavy overhead, and degrade the aggregated throughput of the GPU in consequence. To improve the aggregated throughput of GPUs, we design a system that eliminates unnecessary context switches throughput adaptive kernel reordering. It is challenging to achieve the above purpose because emerging real-system GPUs encapsulate the kernel scheduling policy in the device driver and there is no open interface to schedule/reorder the kernels. To resolve the above challenge, the designed system reorders kernels of different applications before they are transmitted to GPU. By scheduling kernels in the same application as long as possible unless the interval between adjacent kernels of an application is too long, the expenses of context switching can be reduced. Specifically, we propose three kernel reordering methods and test them on a Nvidia P100 GPU. Our experimental results show the aggregated throughput is improved by up to 31.07% when multiple applications share a GPU.
What problem does this paper attempt to address?