A High-Throughput and Flexible Architecture Based on a Reconfigurable Mixed-Radix FFT with Twiddle Factor Compression and Conflict-Free Access
Chen Yang,Junfeng Wu,Siwei Xiang,Liyan Liang,Li Geng
DOI: https://doi.org/10.1109/tvlsi.2023.3298943
2023-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Mixed-radix fast Fourier transform (FFT) algorithms are widely adopted in high-performance communication systems, such as 5G systems. However, in the decimation-in-time (DIT) FFT algorithm, neither the input time-domain series nor the intermediate-stage data are fetched in natural order, which causes a mismatch between the processing element (PE) computation speed and the memory bandwidth. In addition, multiple duplicate interstage twiddle factor (TF) generation units are designed to provide interstage TFs in parallel, and such duplication results in considerable waste of computing and storage resources. In this article, we propose a flexible and reconfigurable architecture based on a mixed-radix FFT approach, supporting 61 different FFT sizes modes from 12 to 3240 points ( $2^{\alpha }$ or $12\times n$ , $n \le270$ , and $n$ = $2^{\alpha } 3^{\beta } 5^{\gamma }$ ). A DIT-FFT-based multiple parallel changeable-radix butterfly unit (BU) is designed to improve hardware resource utilization. Around the PE array, a conflict-free access structure with a hardware-friendly address generating method is presented. In addition, a TF sharing and compression structure is designed to reduce the area and delay of the TF generation units. The FFT architecture is implemented in semiconductor manufacturing international corporation (SMIC) 40-nm CMOS technology with a working frequency of 483 MHz, performing at 654 MS/s for 2048-point and 1536-point FFTs. Compared to the state-of-the-art mixed-radix FFT designs, our architecture achieves improvements of up to $5.89\times $ in throughput and area efficiency and supports more FFT modes.