An efficient implementation of double precision 1-D FFT for GPUs Using CUDA

Yanjun Liu,Licai Guo,Bin Luo,Xingyi Zhang
2012-01-01
Journal of Information and Computational Science
Abstract:Fast Fourier Transform (FFT) is a well known and widely used tool in many scientific and engineering fields. CUFFT, which is the NVIDIA's FFT library included in the CUDA toolkit, supports double precision FFTs. However, the implementation of CUFFT is not very efficient. In this paper, we implement an efficient double-precision Cooley-tukey algorithm for GPUs using CUDA. Some programming techniques are employed to exploit the hardware characteristics. These techniques include on-chip shared memory utilization, removing redundant computation, and coalescing the global memory access. Experiments show that the performance of our 1-D FFT is as fast as CUFFT. Furthermore, the performance of our FFT implementation is more than twice faster than CUFFT for small input sizes. Copyright © 2012 Binary Information Press.
What problem does this paper attempt to address?