Tiling for Performance Tuning on Different Models of GPUs

Chang Xu,Steven R. Kirk,Samantha Jenkins

DOI: https://doi.org/10.48550/arXiv.1001.1718

2010-01-12

Abstract:The strategy of using CUDA-compatible GPUs as a parallel computation solution to improve the performance of programs has been more and more widely approved during the last two years since the CUDA platform was released. Its benefit extends from the graphic domain to many other computationally intensive domains. Tiling, as the most general and important technique, is widely used for optimization in CUDA programs. New models of GPUs with better compute capabilities have, however, been released, new versions of CUDA SDKs were also released. These updated compute capabilities must to be considered when optimizing using the tiling technique. In this paper, we implement image interpolation algorithms as a test case to discuss how different tiling strategies affect the program's performance. We especially focus on how the different models of GPUs affect the tiling's effectiveness by executing the same program on two different models of GPUs equipped testing platforms. The results demonstrate that an optimized tiling strategy on one GPU model is not always a good solution when execute on other GPU models, especially when some external conditions were changed.

Distributed, Parallel, and Cluster Computing,Performance

What problem does this paper attempt to address?

This paper aims to explore the impact of different tiling strategies on program performance on different GPU models. Specifically, by implementing an image interpolation algorithm as a test case, the paper analyzes how different tiling strategies affect program performance when the same program is executed on different GPU models. The study particularly focuses on the influence of different GPU models on the tiling effect. By executing on two different types of GPU test platforms respectively for the same program, it verifies whether the optimized tiling strategy can provide good performance on different GPU models. The main contribution of the paper lies in revealing that an optimized tiling strategy may perform excellently on one GPU model, but may not necessarily maintain the same performance advantage on other GPU models, especially when external conditions change. This indicates that when optimizing CUDA programs, it is necessary to consider the specific computing power and characteristics of the target GPU in order to select the most suitable tile size, thereby achieving the best performance optimization effect.

Tiling for Performance Tuning on Different Models of GPUs

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

DHTS: A Dynamic Hybrid Tiling Strategy for Optimizing Stencil Computation on GPUs

An Efficient Tile Size Selection Model Based on Machine Learning.

CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms

Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

Parallelization And Optimization Of Sift On Gpu Using Cuda

An Accurate Gpu Performance Model For Effective Control Flow Divergence Optimization

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization.

A coordinated tiling and batching framework for efficient GEMM on GPUs.

A quantitative performance analysis model for GPU architectures

A Performance Model for General-Purpose Computation on GPU

Accelerating Geospatial Analysis on GPUs Using CUDA

Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation

Performance Modeling and Optimization of Sparse Matrix-Vector Multiplication on NVIDIA CUDA Platform

A Performance Model for GPU Architectures That Considers On-Chip Resources: Application to Medical Image Registration

Performance modeling of graphics processing unit application using static and dynamic analysis

Efficient GPU Spatial-Temporal Multitasking

Analyzing CUDA workloads using a detailed GPU simulator