Study on Optimization Techniques for Memory Accesses of CUDA Parallel Programs

Zou Yan,Yang Zhiyi,Zhang Kailong
DOI: https://doi.org/10.16526/j.cnki.11-4762/tp.2009.12.033
2009-01-01
Abstract:We analyze the distinct features of CUDA(Compute Unified Device Architecture) and the mechanism of its memory accesses,summarize the representative issues of memory accesses in CUDA parallel programs,and present the optimization strategy aiming at non-coalesced accesses of global memory and bank conflicts of shared memory.Using a histogram equalization algorithm for tests,we compare the execution time of original to optimized programs.The experimental results show that the greater the image pixels,the better the optimization results.
What problem does this paper attempt to address?