GPGPU Memory Estimation and Optimization Targeting OpenCL Architecture

Junfeng Zhu,Gang Chen,Baifeng Wu
DOI: https://doi.org/10.1109/cluster.2012.9
2012-01-01
Abstract:The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order to fully exploit the capability of GPU for general purpose computing under heterogeneous processing platforms, we propose performance estimation and optimization methods targeting OpenCL architecture. Our approach is to utilize polyhedral representation of a source program in order to optimize and allocate global memory and fast memory of GPUs. By checking the memory access patterns of the program, we discover access instances those can be grouped together using graph coloring. Subsequently, we estimate the memory performance of this program, with the purpose of eliminating the uncoalesced global memory accesses. Then, we utilize data space transformation to alter the irregular memory access patterns for the sake of improving the off-chip memory bandwidth by taking advantage of vector data types. Meanwhile, we detect the reuse information to allocate data into distinct fast memory regions according to both the properties of data accesses and the characteristics of the OpenCL memory model, with the purpose of making best usage of the fast on-chip memory. Experimental results on an AMD/ATI HD5850 GPU for a set of commonly-used benchmarks show that we achieve 2.1X~6.7X speedup with respect to the un-optimized versions and the present global memory performance model can estimate the global memory performance relative accurately.
What problem does this paper attempt to address?