An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation

Hai-yan CHEN,Chao YANG,Sheng LIU,Zhong LIU
DOI: https://doi.org/10.3969/j.issn.0372-2112.2016.02.001
2016-01-01
Abstract:As more and more execution units are integrated in the digital signal processor(DSP)with single instruction multiple data stream(SIMD)extension,the flexibility and bandwidth efficiency of parallel memory access have significant effects on its whole practical performance.Based on detailed analysis of the memory access problems for radix-2 fast Fourier transform (FFT)algorithm in general SIMD DSP,this paper used parts of the address bit XOR logic to realize memory access address trans-lation,and achieved conflict-free parallel SIMD memory accesses for FFT computation.Then several memory access instructions with special shuffle modes were brought forward,which could completely eliminate extra shuffling instruction operations of radix-2 FFT algorithm in the SIMD architecture.Finally,the vector memory(VM)in 16-way SIMD DSP YHFT-Matrix2 was optimized by above methods.The test results show that the optimized VM can realize fully pipelined conflict-free memory accesses and 100%parallel memory access bandwidth utilization with increase of 18%area overheads.Compared with the design before opti-mization,the performance of different points radix-2 FFT can achieve speedup ranging from 1. 32 to 2. 66.
What problem does this paper attempt to address?