HeteroCore GPU to Exploit TLP-Resource Diversity

Xia Zhao,Zhiying Wang,Lieven Eeckhout
DOI: https://doi.org/10.1109/tpds.2018.2854764
IF: 5.3
2019-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Graphics processing units (GPUs) are widely adopted as compute accelerators in cloud computing environments and supercomputers. Sharing GPU resources in such environments requires effective multitasking support. Unfortunately, conventional GPUs lack the ability to adapt to diverse thread-level parallelism (TLP) resource demands among co-executing kernels. Previous work such as SM partitioning and simultaneously multitasking (SMK) increase system throughput, however, they degrade per-application performance significantly. This paper proposes the HeteroCore GPU to significantly improve multitasking performance with a similar area cost as a conventional GPU. After rebalancing TLP-related SM resources, a HeteroCore GPU consists of two types of SMs to support diverse TLP-resource demands. Dynamic scheduling performs low-overhead spatial profiling during runtime across the different SM types and steers scheduling decisions based on the TLP-resource demands of the co-executing kernels. Compared to a conventional GPU, HeteroCore GPU improves system throughput by 20.1 percent on average (up to 80.9 percent) and per-application performance by 29.8 percent on average (up to 50.3 percent), for workload mixes composed of kernels with different TLP-resource demands.
What problem does this paper attempt to address?