A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling

Elias Konstantinidis,Yiannis Cotronis
DOI: https://doi.org/10.1016/j.jpdc.2017.04.002
IF: 4.542
2017-09-01
Journal of Parallel and Distributed Computing
Abstract:Typically, the execution time of a kernel on a GPU is a difficult to predict measure as it depends on a wide range of factors. Performance can be limited by either memory transfer, compute throughput or other latencies. In this paper, we improve on the roofline model following a quantitative approach and present a completely automated GPU performance prediction technique. In this respect this model utilizes micro-benchmarking and profiling in a “black box” fashion as no inspection of source/binary code is required. The proposed model combines parameters in order to characterize the performance limiting factor and to estimate execution time. In addition, we propose the quadrant-split visual representation, which captures the characteristics of multiple processors in relation to a particular kernel. We performed experiments on stencil computation (red/black SOR), SGEMM and a total of 28 kernels of the Rodinia benchmark suite, using six CUDA GPUs and we showed an absolute error in predictions of 27.66% in the average case. Furthermore, the performance model was also examined on an AMD GPU through the HIP programming environment. Prediction errors were comparable despite the significant architectural differences between different vendor GPUs.
computer science, theory & methods
What problem does this paper attempt to address?