Parallel FFT algorithm implementation based on coarse-grained reconfigurable architecture

Peng Cao,Jinjiang Yang,Chen Mei
DOI: https://doi.org/10.3969/j.issn.1001-0505.2013.06.008
2013-01-01
Abstract:In order to enhance the performance of the fast Fourier transform (FFT)algorithm,an implementation of complex FFT based on REMUS_LPP(reconfigurable embedded multimedia sys-tem,low performance processor),which is a coarse-grained reconfigurable architecture (CGRA)-based architecture,is proposed.The lower stages of the FFT algorithm are performed in local serial mode,and then the higher stages are carried out in parallel mode with the exchanged intermediate re-sult of lower stages.Aiming at the optimization of data transfer in and between reconfigurable com-puting arrays (RCAs),the technique of pipeline bubble elimination and data block location rear-rangement are presented to enhance the performance and reduce the on-chip memory cost.The pro-posed FFT algorithm was realized with real chip.The processing speed of the proposed FFT algo-rithm implementation is 2.15 to 13.60 times higher than that of other parallel FFT implementations with only a 7.0% to 28.1% local memory cost.
What problem does this paper attempt to address?