Vectorizable Design and Implementation of FFT Based on Fused Multiply-add Architectures

Junyang Zhang,Yang Guo,Xiao Hu
DOI: https://doi.org/10.12783/dtetr/iceta2016/6968
2017-01-01
DEStech Transactions on Engineering and Technology Research
Abstract:This paper proposed a high efficient method by using fused multiply-add instruction to map FFT algorithms based on vector processors. According to the architecture feature of YHFT-Matrix, combing shuffle needs with memory access requests to reduce shuffling pattern, and also the method which utilizes software pipelining to fully exploit instruction-level and data-level parallelism of FFT algorithms. Then the calculating performance is improved. Experimental results show that FFT algorithms achieve high computing performance and speedups. For instance, after adopting FMA instruction optimization, the chip’s computational efficiency of 1024-point double-precision floating-point FFT algorithm is about 10% higher than before.
What problem does this paper attempt to address?