Abstract:The rapid updates in error-resilient applications along with their quest for high throughput have motivated designing fast approximate functional units for Field-Programmable Gate Arrays (FPGAs). Studies that proposed imprecise functional techniques are posed with three shortcomings: first, most inexact multipliers and dividers are specialized for Application-Specific Integrated Circuit (ASIC) platforms. Second, state-of-the-art (SoA) approximate units are substituted, mostly in a single kernel of a multi-kernel application. Moreover, the end-to-end assessment is adopted on the Quality of Results (QoR), but not on the overall gained performance. Finally, existing imprecise components are not designed to support a pipelined approach, which could boost the operating frequency/throughput of, e.g., division-included applications. In this paper, we propose RAPID, the first pipelined approximate multiplier and divider architecture, customized for FPGAs. The proposed units efficiently utilize 6-input Look-up Tables (6-LUTs) and fast carry chains to implement Mitchell's approximate algorithms. Our novel error-refinement scheme not only has negligible overhead over the baseline Mitchell's approach but also boosts its accuracy to 99.4% for arbitrary size of multiplication and division. Experimental results demonstrate the efficiency of the proposed pipelined and non-pipelined RAPID multipliers and dividers over accurate counterparts. Moreover, the end-to-end evaluations of RAPID, deployed in three multi-kernel applications in the domains of bio-signal processing, image processing, and moving object tracking for Unmanned Air Vehicles (UAV) indicate up to 45% improvements in area, latency, and Area-Delay-Product (ADP), respectively, over accurate kernels, with negligible loss in QoR.

Improving Performance of Floating Point Division on GPU and MIC

Implementation of High Performance Single-Precision Divider

Low Cost Design for Elementary Function Approximation Based on Piecewise Quadratic Interpolation

Optimally Approximated and Unbiased Floating-Point Multiplier with Runtime Configurability

Approximate Floating-Point FFT Design with Wide Precision-Range and High Energy Efficiency.

Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

PACE: A Piece-Wise Approximate and Configurable Floating - Point Divider for Energy - Efficient Computing

Hardware Implementation of Approximate Fixed-point Divider for Machine Learning Optimization Algorithm

Floating-Point Unit Processing Denormalized Numbers

PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier with Unbiasedness and Configurability

An Area- and Energy-Efficient Hybrid Architecture for Floating-Point FFT Computations.

Precision-Aware Iterative Algorithms Based on Group-Shared Exponents of Floating-Point Numbers

An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation

SIMDive: Approximate SIMD Soft Multiplier-Divider for FPGAs with Tunable Accuracy

RAPID: AppRoximAte Pipelined Soft Multipliers and Dividers for High-Throughput and Energy-Efficiency

A Low-Cost Floating-Point FMA Unit Supporting Package Operations for HPC-AI Applications

A Hybrid SDC/SDF Architecture for Area and Power Minimization of Floating-Point FFT Computations

Accelerating Accuracy Improvement for Floating Point Programs Via Memory Based Pruning

A Low-Latency Power Series Approximate Computing and Architecture for Co-Calculation of Division and Square Root

Exploring and Exploiting Runtime Reconfigurable Floating Point Precision in Scientific Computing: a Case Study for Solving PDEs

Low-Latency Architecture for Implementing Floating-Point Multiplier and Divider Based on Symmetric-Mapping LUT