Overcoming Limitations of GPGPU-Computing in Scientific Applications

Connor Kenyon,Glenn Volkema,Gaurav Khanna
DOI: https://doi.org/10.48550/arXiv.1905.05175
2019-05-10
Abstract:The performance of discrete general purpose graphics processing units (GPGPUs) has been improving at a rapid pace. The PCIe interconnect that controls the communication of data between the system host memory and the GPU has not improved as quickly, leaving a gap in performance due to GPU downtime while waiting for PCIe data transfer. In this article, we explore two alternatives to the limited PCIe bandwidth, NVIDIA NVLink interconnect, and zero-copy algorithms for shared memory Heterogeneous System Architecture (HSA) devices. The OpenCL SHOC benchmark suite is used to measure the performance of each device on various scientific application kernels.
Computational Physics,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the GPU performance bottleneck problem in current high - performance computing (HPC) due to the PCIe bus bandwidth limitation. Specifically, the paper explores two alternative solutions to overcome the impact of PCIe bandwidth limitation on GPU computing: 1. **NVIDIA NVLink**: This is a high - speed and high - bandwidth communication protocol developed by NVIDIA for data transfer between the host system and GPGPU, as well as between multiple GPGPU within the same system. The peak communication rate of NVLink is 300 GB/s, which is 10 times faster than PCIe 3.0. 2. **Zero - Copy Algorithms**: This is a technique implemented on Heterogeneous System Architecture (HSA) devices. Through the shared memory mechanism, the CPU and GPU can operate within the same memory space, thus avoiding the need for data to be transferred back and forth between the CPU and GPU. The paper measures and compares the performance of different devices on various scientific application kernels using the OpenCL SHOC benchmark suite to evaluate the effectiveness of these alternative solutions. The main objective is to show how these new technologies can significantly improve the performance of scientific computing and reduce energy consumption costs.