Fair and Cache Blocking Aware Warp Scheduling for Concurrent Kernel Execution on GPU

Chen Zhao,Wu Gao,Feiping Nie,Fei Wang,Huiyang Zhou
DOI: https://doi.org/10.1016/j.future.2020.05.023
IF: 7.307
2020-01-01
Future Generation Computer Systems
Abstract:With Graphic Processing Units (GPUs) being widely adopted in data centers to provide computing power, efficient support for GPU multitasking attracts significant attention. The prior GPU multitasking works include spatial multitasking and simultaneous multitasking (SMK). Spatial multitasking allocates GPU resources at the streaming multiprocessor (SM) granularity which is coarse-grained, and SMK runs concurrent kernels on the same SM, therefore is fine-grained. SMK is beneficial to improve GPU resource utilization especially when concurrent kernels have complementary characteristics. However, the main challenge for SMK is the interference among multiple kernels especially the contention on data cache. In this paper, we propose a fair and cache blocking aware warp scheduling (FCBWS) approach to ameliorate the contention on data cache and improve SMK on GPUs. In FCBWS, equal opportunity of issuing instructions is provided to each kernel, and memory pipeline stalls are tried to be avoided by predicting cache blocking. Kernels are extracted from various applications to construct concurrent kernel execution benchmarks. The simulation experiment results show that FCBWS outperforms previous multitasking methods; even compared to the state-of-the-art SMK method, FCBWS can improve system throughput (STP) by 10% on average and reduce average normalized turnaround time (ANTT) by 41% on average.
What problem does this paper attempt to address?