Abstract:Fast Fourier Transform (FFT) is frequently invoked in stream processing, e.g., calculating the spectral representation of audio/video frames, and in many cases the inputs are sparse, i.e., most of the inputs' Fourier coefficients being zero. Many sparse FFT algorithms have been proposed to improve FFT's efficiency when inputs are known to be sparse. However, like their "dense" counterparts, existing sparse FFT implementations are input oblivious in the sense that how the algorithms work is not affected by the value of input. The sparse FFT computation on one frame is exactly the same as the computation on the next frame. This paper improves upon existing sparse FFT algorithms by simultaneously exploiting the input sparsity and the similarity between adjacent inputs in stream processing. Our algorithm detects and takes advantage of the similarity between input samples to automatically design and customize sparse filters that lead to better parallelism and performance. More specifically, we develop an efficient heuristic to detect the similarity between the current input to its predecessor in stream processing, and when it is found to be similar, we novelly use the spectral representation of the predecessor to accelerate the sparse FFT computation on the current input. Given a sparse signal that has only $k$ non-zero Fourier coefficients, our algorithm utilizes sparse approximation by tuning several adaptive filters to efficiently package the non-zero Fourier coefficients into a small number of bins which can then be estimated accurately. Therefore, our algorithm has runtime sub-linear to the input size and gets rid of recursive coefficient estimation, both of which improve parallelism and performance. Furthermore, the new heuristic can detect the discontinuities inside the streams and resumes the input adaptation very quickly. We evaluate our input-adaptive sparse FFT implementation on Intel i7 CPU and three NVIDIA GPUs, i.e., NVIDIA GeForce GTX480, Tesla C2070 and Tesla C2075. Our algorithm is faster than previous FFT implementations both in theory and implementation. For inputs with size N=2^{24}, our parallel implementation outperforms FFTW for k up to 2^{18}, which is an order of magnitude higher than prior sparse algorithms. Furthermore, our input adaptive sparse FFT on Tesla C2075 GPU achieves up to 77.2x and 29.3x speedups over 1-thread and 4-thread FFTW, 10.7x, 6.4x, 5.2x speedups against sFFT 1.0, sFFT 2.0, CUFFT, and 6.9x speedup over our sequential CPU performance, respectively.

Parallel Sparse FFT.

Input-adaptive Parallel Sparse Fast Fourier Transform for Stream Processing

An Efficient Data Layout Transformation Algorithm for Locality-Aware Parallel Sparse Fft

An Input-Adaptive Algorithm for High Performance Sparse Fast Fourier Transform.

Parallel Fast Fourier Transform in SPMD Style of Cilk.

A Sparse Data Fast Fourier Transform (SDFFT) - Algorithm and Implementation

Parallel Optimization and Hardware Customization for Fast Fourier Transform

Empirical Evaluation of Typical Sparse Fast Fourier Transform Algorithms.

Some New Parallel Fast Fourier Transform Algorithms

An Optimized Parallel FFT Algorithm on Multiprocessors with Cache Technology in Linux

A sparse data fast Fourier transform (SDFFT)

A Parallel Algorithm of Three-Dimensional Fast Fourier Transform

Sparse Fast Clifford Fourier Transform.

MFFT: A GPU Accelerated Highly Efficient Mixed-precision Large-scale FFT Framework

Fast Fourier Transform Of Sparse Spatial Data To Sparse Fourier Data

Highly Effective FFT Algorithm Based on Parallel Techniques

A GPU Based Memory Optimized Parallel Method For FFT Implementation

HI-FFT: Heterogeneous Parallel In-Place Algorithm for Large-Scale 2D-FFT

Mpi-Based Parallelized Precorrected Fft Algorithm for Analyzing Scattering by Arbitrarily Shaped Three-Dimensional Objects - Abstract