Tiling for Performance Tuning on Different Models of GPUs

Chang Xu,Steven R. Kirk,Samantha Jenkins
DOI: https://doi.org/10.48550/arXiv.1001.1718
2010-01-12
Abstract:The strategy of using CUDA-compatible GPUs as a parallel computation solution to improve the performance of programs has been more and more widely approved during the last two years since the CUDA platform was released. Its benefit extends from the graphic domain to many other computationally intensive domains. Tiling, as the most general and important technique, is widely used for optimization in CUDA programs. New models of GPUs with better compute capabilities have, however, been released, new versions of CUDA SDKs were also released. These updated compute capabilities must to be considered when optimizing using the tiling technique. In this paper, we implement image interpolation algorithms as a test case to discuss how different tiling strategies affect the program's performance. We especially focus on how the different models of GPUs affect the tiling's effectiveness by executing the same program on two different models of GPUs equipped testing platforms. The results demonstrate that an optimized tiling strategy on one GPU model is not always a good solution when execute on other GPU models, especially when some external conditions were changed.
Distributed, Parallel, and Cluster Computing,Performance
What problem does this paper attempt to address?
This paper aims to explore the impact of different tiling strategies on program performance on different GPU models. Specifically, by implementing an image interpolation algorithm as a test case, the paper analyzes how different tiling strategies affect program performance when the same program is executed on different GPU models. The study particularly focuses on the influence of different GPU models on the tiling effect. By executing on two different types of GPU test platforms respectively for the same program, it verifies whether the optimized tiling strategy can provide good performance on different GPU models. The main contribution of the paper lies in revealing that an optimized tiling strategy may perform excellently on one GPU model, but may not necessarily maintain the same performance advantage on other GPU models, especially when external conditions change. This indicates that when optimizing CUDA programs, it is necessary to consider the specific computing power and characteristics of the target GPU in order to select the most suitable tile size, thereby achieving the best performance optimization effect.