Software and Hardware Cooperate for 1-D FFT Algorithm Optimization on Multicore Processors

Yongbin Zhou,Junchao Zhang,Dongrui Fan
DOI: https://doi.org/10.1109/CIT.2009.101
2009-01-01
Abstract:Multicore architecture is becoming a promise to keep Moore's Law and brings a revolution in both research and industry which results new design space for software and architecture. Fast Fourier transform (FFT), computing intensive and bandwidth intensive, is one of the most popular and important applications in the world. Compared with the computing resource on multicore architecture, the on-chip memory resource is much more expensive because of the limitation of physical chip size. Efficient implementation of FFT algorithm on multicore with good scalability is a challenge for both software and hardware developers. In this paper, supported by the Godson-T architecture, an optimized implementation of 1-D FFT has been developed with matrix transpose conceal and computation/communication overlapping, which achieve more than 30% performance improvement as well as almost 1/3 L2 cache consumption reduce comparing with the base six-step FFT. The limitation of scalability is also analyzed and the conclusion is that on Godson-T when frequency and simultaneous data access happen, the limited access bandwidth of L2 cache is the bottleneck and result in the longer on-chip network latency.
What problem does this paper attempt to address?