Kernel concurrency opportunities based on GPU benchmarks characterization

Pablo Carvalho,Rommel Cruz,Lucia M. A. Drummond,Cristiana Bentes,Esteban Clua,Edson Cataldo,Leandro A. J. Marzulo
DOI: https://doi.org/10.1007/s10586-018-02901-1
2019-01-17
Cluster Computing
Abstract:Graphical Processing Units (GPUs) became an important platform to general purpose computing, thanks to their high performance and low cost when compared to CPUs. Modern GPU architectures are constantly evolving with growing resources. In order to take advantage of all the resources available and increase the GPU efficiency, new generation GPUs include support for concurrent kernel execution. Different kernels can be executed at the same time and share the GPU resources. Thus, benchmark suites developed to evaluate GPU performance and scalability should take this aspect into account that could be quite different from traditional CPU benchmarks. Nowadays, SHOC, Parboil, and Rodinia are the main benchmark suites for evaluating GPUs. This work analyzes these benchmark suites in a novel way. We propose to categorize the kernels of each application of these benchmarks by multiple criteria, built on their behavior in terms of computation type (integer or float), usage of memory hierarchy, efficiency and hardware occupancy. Based on the characterization results, we analyze kernel concurrency opportunities. The focus is on disclosing the resource requirements of the kernels of these benchmarks and to explain their behavior when executed concurrently.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about how to utilize the concurrent kernel execution capabilities of modern GPUs to improve the overall system throughput. Specifically, the author analyzes three major GPU benchmark suites (Rodinia, Parboil, and SHOC), classifies the kernels in these benchmarks by multiple criteria, studies their behavior during concurrent execution, and their requirements for GPU resources. This helps to understand the performance of different types of kernels during concurrent execution, thereby guiding more efficient and convenient concurrent scheduling strategies. The main objectives of the paper include: 1. **Analyze benchmark suites**: Conduct a detailed analysis of the kernels in the three mainstream GPU benchmark suites, Rodinia, Parboil, and SHOC. 2. **Classify kernels**: Classify the kernels according to their resource usage (such as integer operations, single - precision and double - precision floating - point operations, SM efficiency, GPU occupancy, and memory operations, etc.). 3. **Evaluate concurrent execution opportunities**: Experimentally evaluate the opportunities for different types of kernels during concurrent execution, with particular attention to how resource requirements affect the effectiveness of concurrent execution. Through these analyses, the author hopes to reveal the behavioral characteristics of kernels during concurrent execution and provide a basis for designing more efficient GPU scheduling strategies.