Streaming FFT Asynchronously on Graphics Processor Units

Zhao Lili,Zhang Shengbing,Zhang Meng,Zhang Yi
DOI: https://doi.org/10.1109/IFITA.2010.76
2010-01-01
Abstract:The Fast Fourier Transform (FFT), which charactered in memory-access-intensive, follows a divide-and-conquer strategy, is one of the most important and heavily used kernel in scientific computing. The newest generation of Graphics Processor Units (GPUs) implement a stream architecture besides acting as powerful massively parallel coprocessor. Fouthermore, the intruduction of APIs for general-purpose computation on GPUs mades GPUs an attractive choice for high-performance numerical and scientific computing. In this work we deal with the implementation of the FFT on a novel NVIDIA GPU, using the CUDA programming model. By optimizing the organiztion of signal data, exploiting the memory hierairchy, and associating the stream to different operations, we efficiently overlap kernel execution and data transfer. Our results indicate a significant performance improvement over GPU-based and CPU-based FFT algorithms. The speedup is 18 percent higher than the original GPU-based on average.
What problem does this paper attempt to address?