Overcoming Limitations of GPGPU-Computing in Scientific Applications

Connor Kenyon,Glenn Volkema,Gaurav Khanna

DOI: https://doi.org/10.48550/arXiv.1905.05175

2019-05-10

Abstract:The performance of discrete general purpose graphics processing units (GPGPUs) has been improving at a rapid pace. The PCIe interconnect that controls the communication of data between the system host memory and the GPU has not improved as quickly, leaving a gap in performance due to GPU downtime while waiting for PCIe data transfer. In this article, we explore two alternatives to the limited PCIe bandwidth, NVIDIA NVLink interconnect, and zero-copy algorithms for shared memory Heterogeneous System Architecture (HSA) devices. The OpenCL SHOC benchmark suite is used to measure the performance of each device on various scientific application kernels.

Computational Physics,Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the GPU performance bottleneck problem in current high - performance computing (HPC) due to the PCIe bus bandwidth limitation. Specifically, the paper explores two alternative solutions to overcome the impact of PCIe bandwidth limitation on GPU computing: 1. **NVIDIA NVLink**: This is a high - speed and high - bandwidth communication protocol developed by NVIDIA for data transfer between the host system and GPGPU, as well as between multiple GPGPU within the same system. The peak communication rate of NVLink is 300 GB/s, which is 10 times faster than PCIe 3.0. 2. **Zero - Copy Algorithms**: This is a technique implemented on Heterogeneous System Architecture (HSA) devices. Through the shared memory mechanism, the CPU and GPU can operate within the same memory space, thus avoiding the need for data to be transferred back and forth between the CPU and GPU. The paper measures and compares the performance of different devices on various scientific application kernels using the OpenCL SHOC benchmark suite to evaluate the effectiveness of these alternative solutions. The main objective is to show how these new technologies can significantly improve the performance of scientific computing and reduce energy consumption costs.

Overcoming Limitations of GPGPU-Computing in Scientific Applications

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect

Scientific Computing Using Consumer Video-Gaming Hardware Devices

Kernel concurrency opportunities based on GPU benchmarks characterization

High Performance Computing Via a GPU

The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Impact of CUDA and OpenCL on Parallel and Distributed Computing

Taking GPU Programming Models to Task for Performance Portability

Optimizing the LINPACK Algorithm for Large-Scale PCIe-Based CPU-GPU Heterogeneous Systems

Efficient Resource Sharing Through GPU Virtualization on Accelerated High Performance Computing Systems

PConG: A Novel Platform Available for Pervasive Computing Based on GPU

GPU First -- Execution of Legacy CPU Codes on GPUs

Performance Optimization Strategies of High Performance Computing on GPU

GPU Parallel Computing: Programming Language, Debugging Tools and Data Structures

An Efficient Acceleration of Symmetric Key Cryptography Using General Purpose Graphics Processing Unit

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

GPU computing using concurrent kernels: A case study

A Closer Look at GPGPU.

From GPU to CPU (and Beyond): Extending Hardware Support in GPUSPH Through a SYCL‐Inspired Interface

Enabling predictable parallelism in single-GPU systems with persistent CUDA threads