A Parallel 1-D FFT Implementation Method for Multi-core Vector Processors

Zhong Liu,Xi Tian
DOI: https://doi.org/10.1007/978-981-13-5919-4_5
2019-01-01
Abstract:This paper presents an efficient parallel 1-D FFT implementation method based on the architecture features of multi-core vector processor. It divides the parallel computation of large-point 1-D FFT into the (n-m)-level parallel FFT computation and M-point parallel FFT computation according to the number of data points M that can be accommodated in the global cache (GC). The parallel FFT computation for each stage are performed using a shared DDR data method in (n-m)-level FFT computation. In the M-point parallel FFT computation, a parallel FFT computation method based on the matrix Fourier algorithm is designed, it converts the original M-point 1-D FFT computation into a 2-D FFT computation, and achieves parallel FFT computation using a shared GC data method, which avoids multiple data transfers between GC and AM and reduces data transmission overhead. Merge Column FFT computation with factor matrix multiplication and column FFT computation results in the AM, which further reduces the number of data transfer between AM and GC, and can significantly improve the efficiency of M-point FFT computation. The experimental results on Matrix show that the average speedup of the single-core single-precision 1-D FFT is 8.26 times and the average speedup of the dual-core single-precision 1-D FFT is 6.78 times compared with the TMS320C6678 with the same frequency.
What problem does this paper attempt to address?