Fast Fourier transforms for the evaluation of convolution products: CPU versus GPU implementation

B. Van de Wiele,A. Vansteenkiste,B. Van Waeyenberge,L. Dupré,D. De Zutter
DOI: https://doi.org/10.1002/jnm.1960
2013-12-18
Abstract:SUMMARY In a large variety of research areas, convolution products that relate a physical quantity in some observation points with their sources are encountered. When the sources and the observation points coincide, the numerical evaluation of the physical quantity typically leads to order N 2 numerical problems. Here, fast Fourier transforms are widely used to reduce the computations to order N log N complexity. When adopting Fourier transforms (FFT) for finite physical problems, zero padding is required. Hence, in 2D and 3D problems, an optimization of the evaluation of the convolution product includes a non‐execution of Fourier transforms on arrays containing only zeros in the forward 2D or 3D FFT scheme and their corresponding arrays in the inverse 2D or 3D FFT scheme. This paper describes the implementation of such an approach on graphical processing units (GPUs) and compares the time gains on GPU and on CPU. It is found that on CPU, the speedup corresponds with the theoretical limit, while in the GPU implementation, the memory bandwidth limits the speedup ratio. Copyright © 2013 John Wiley & Sons, Ltd.
engineering, electrical & electronic,mathematics, interdisciplinary applications
What problem does this paper attempt to address?