Abstract:Floating-point fast Fourier transform (FFT) has been widely expected in scientific computing and high-resolution imaging applications due to the wide dynamic range and high processing precision. However, it suffers high area and energy overhead problems in comparison to fixed-point implementations. To address these issues, this paper presents an area- and energy-efficient hybrid architecture for floating-point FFT computations. It minimizes the required arithmetic units and reduces the memory usage significantly by combining three different parts. The serial radix-4 butterfly (SR4BF) is used in the single-path delay commutator (SDC) part to minimize the required arithmetic units with 100% adder utilization ratio obtained. A modified single-path delay feedback (MSDF) architecture is proposed to achieve a tradeoff between arithmetic resources and memory usage by using the new half radix-4 butterfly (HR4BF) with 50% adder utilization ratio obtained. The intermediate caching buffer is modified accordingly in the MSDF part. By combining both the advantages on arithmetic units reducing and memory usage optimization in different parts, the optimized area and power are obtained without throughput loss. The logic synthesis results in a 65 nm CMOS technology show that the energy per FFT is about 331.5 nJ for 1024-point FFT computations at 400 MHz. The total hardware overhead is equivalent to 460k NAND2 gates.

Vectorizable Design and Implementation of FFT Based on Fused Multiply-add Architectures

Analysis of FFT transform and implementation of circuit design based on FPGA

A Low Latency High Throughput Multiply-accumulator Unit for Float Point and Integer

Approximate Floating-Point FFT Design with Wide Precision-Range and High Energy Efficiency.

An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation

Design and Applications of a New Type of FFT Processor with High Efficiency

Vector Processing Support for FPGA-Oriented High Performance Applications

Efficient Utilization of Vector Registers to Improve FFT Performance on SIMD Microprocessors

Design and Implementation of High Speed Fixed-Point Fast Fourier Transform Processor

A GPU Based Memory Optimized Parallel Method For FFT Implementation

A VLSI Array Processing Oriented Fast Fourier Transform Algorithm and Hardware Implementation

An Area- and Energy-Efficient Hybrid Architecture for Floating-Point FFT Computations.

An Improved Fft Architecture Optimized for Reconfigurable Application Specified Processor

Parallel Solution of FFT on FPGA-Based NUMA Multiprocessor System on Chip

VLSI Design of an Efficient Reconfigurable FFT Processor and its Application

An Efficient Radix-2 Fast Fourier Transform Processor with Ganged Butterfly Engines on Field Programmable Gate Arrays.

Efficient and Flexible Implementation of FFT Application for CGRA Processor

A Pipelined Algorithm and Area-Efficient Architecture for Serial Real-Valued FFT

Acceleration of FFT Algorithm Based on Reconfigurable Computing Architecture

Design of Field Programmable Gate Array Based Real-Time Double-Precision Floating-Point Matrix Multiplier