Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration

Shervin Vakili,Mobin Vaziri,Amirhossein Zarei,J.M. Pierre Langlois

2023-10-16

Abstract:Multipliers are widely-used arithmetic operators in digital signal processing and machine learning circuits. Due to their relatively high complexity, they can have high latency and be a significant source of power consumption. One strategy to alleviate these limitations is to use approximate computing. This paper thus introduces an original FPGA-based approximate multiplier specifically optimized for machine learning computations. It utilizes dynamically reconfigurable lookup table (LUT) primitives in AMD-Xilinx technology to realize the core part of the computations. The paper provides an in-depth analysis of the hardware architecture, implementation outcomes, and accuracy evaluations of the multiplier proposed in INT8 precision. Implementation results on an AMD-Xilinx Kintex Ultrascale+ FPGA demonstrate remarkable savings of 64% and 67% in LUT utilization for signed multiplication and multiply-and-accumulation configurations, respectively, when compared to the standard Xilinx multiplier core. Accuracy measurements on four popular deep learning (DL) benchmarks indicate a minimal average accuracy decrease of less than 0.29% during post-training deployment, with the maximum reduction staying less than 0.33%. The source code of this work is available on GitHub.

Hardware Architecture

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the issue of implementing efficient, low-latency, and low-cost approximate multipliers on FPGAs, specifically for machine learning computation tasks. #### Main Objectives: 1. **Reduce latency and power consumption**: Optimize multiplication operations by utilizing dynamic reconfigurable lookup table (LUT) technology to reduce latency and power consumption. 2. **Improve hardware utilization**: Increase hardware utilization by reducing the use of lookup tables (LUTs) compared to standard Xilinx multipliers. 3. **Maintain accuracy**: Ensure high accuracy even during the post-training deployment phase, with an average accuracy drop of no more than 0.29% and a maximum accuracy drop of no more than 0.33%. #### Specific Implementation Methods: - **Dynamic Reconfigurable LUT**: Utilize reconfigurable LUTs in AMD-Xilinx technology to implement the core part of the multiplication. - **Internal Format Conversion**: Convert fixed-point representation to floating-point representation to preserve dynamic range, which is beneficial for machine learning applications. - **Optimized Encoding and Decoding**: Design efficient encoding and decoding circuits to convert INT8 data to 8-bit floating-point format and restore it back to INT8 format. Through these methods, the paper proposes a new architecture named DyRecMul, which significantly reduces hardware resource usage while ensuring accuracy.

Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration

Optimally Approximated and Unbiased Floating-Point Multiplier with Runtime Configurability

A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-Widths.

A Reconfigurable Approximate Multiplier for Quantized CNN Applications.

FHAM: FPGA-based High-Efficiency Approximate Multipliers Via LUT Encoding

LCAM: Low-Cost Approximate Multiplier Design on FPGA.

High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware Accelerators

LMM: A Fixed-Point Linear Mapping Based Approximate Multiplier for IoT

PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier with Unbiasedness and Configurability

FPGA-Based Approximate Multiplier for Efficient Neural Computation

FPGA‐Based Resource‐Optimal Approximate Multiplier for Error‐Resilient Applications

AMG: Automated Efficient Approximate Multiplier Generator for FPGAs via Bayesian Optimization

Efficient Approximate Floating-Point Multiplier With Runtime Reconfigurable Frequency and Precision

A Data-Distribution Aware Approximate Multiplier Design Based on FPGA

A Power-Efficient Hardware Implementation of L-Mul

Ultra-Fast, High-Performance 8x8 Approximate Multipliers by a New Multicolumn 3,3:2 Inexact Compressor and its Derivatives

Hardware-accuracy trade-offs for error-resilient applications using an ultra-efficient hybrid approximate multiplier

A Hardware- and Accuracy-Efficient Approximate Multiplier with Error Compensation for Neural Network and Image Processing Applications

Efficient implementation of signed multipliers on FPGAs

High-Speed Energy-Efficient Fixed-Point Signed Multipliers for FPGA-Based DSP Applications

RAPID: AppRoximAte Pipelined Soft Multipliers and Dividers for High-Throughput and Energy-Efficiency