Abstract:Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network, requiring support for variable precision in DNN hardware. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision. To efficiently support precision re-configurability in DNN accelerators, we introduce an approximate computing method wherein DNN computations are performed block-wise (a block is a group of bits) and re-configurability is supported at the granularity of blocks. Results of block-wise computations are composed in an approximate manner to enable efficient re-configurability. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for a given DNN. By varying the approximation configurations across DNNs, we achieve 1.17x-1.73x and 1.02x-2.04x improvement in system energy and performance respectively, over an 8-bit fixed-point (FxP8) baseline, with negligible loss in classification accuracy. Further, by varying the approximation configurations across layers and data-structures within DNNs, we achieve 1.25x-2.42x and 1.07x-2.95x improvement in system energy and performance respectively, with negligible accuracy loss.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **How to efficiently support variable - precision computing in deep neural network (DNN) hardware accelerators to achieve higher energy efficiency and performance while maintaining relatively low precision loss**. Specifically, the paper focuses on: 1. **The need for reduced precision**: In order to improve the energy efficiency of DNN inference, using low - precision (sub - 8 - bit) computing is a popular technique. However, the minimum precision required for different networks, different layers, and even different data structures varies greatly, which requires that the hardware be able to support variable - precision computing. 2. **Limitations of existing methods**: Existing variable - precision hardware (such as bit - serial architectures) can achieve variable - precision computing, but it will bring high energy and latency overheads, thereby weakening the advantages brought by low - precision. 3. **Proposing the Ax - BxP method**: To solve the above problems, the paper proposes Ax - BxP (Approximate Blocked Computation), an approximate blocked - computing method. This method introduces approximation by performing multiply - accumulate operations in blocks and only performing some of the required block - level computations, thereby achieving efficient variable - precision computing. ### Main features of Ax - BxP: - **Block - level computing**: Divide weights and activation values into fixed - length blocks, each block containing multiple bits. - **Approximate computing**: Introduce approximation by only performing some block - level computations, thereby achieving an efficient variable - precision configuration. - **Hardware design**: Propose an architectural enhancement of the DNN accelerator based on the standard systolic array to support Ax - BxP computing. ### Experimental results: For DNN models such as AlexNet, ResNet50, and MobileNetV2, the Ax - BxP method achieved improvements of 1.1x - 1.74x and 1.02x - 2x in system energy consumption and performance respectively, and the loss in classification accuracy was very small (less than 1% on average). In addition, by more finely adjusting the approximate configuration in different layers and data structures of the DNN, the system energy consumption and performance were further improved (improvements of 1.12x - 2.23x and 1.14x - 2.34x respectively). ### Summary: By proposing the Ax - BxP method, the paper solves the problem of efficiently supporting variable - precision computing in DNN hardware accelerators, significantly improving energy efficiency and performance while maintaining relatively low precision loss.

Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Efficient Approximate Floating-Point Multiplier With Runtime Reconfigurable Frequency and Precision

Energy efficient spiking neural network processing using approximate arithmetic units and variable precision weights

INA: Incremental Network Approximation Algorithm for Limited Precision Deep Neural Networks

Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology

ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

A Blueprint for Precise and Fault-Tolerant Analog Neural Networks

Training Deep Neural Networks with 8-bit Floating Point Numbers

AxR-NN: Approximate Computation Reuse for Energy-Efficient Convolutional Neural Networks

PIR-DSP: an FPGA DSP Block Architecture for Multi-precision Deep Neural Networks

Deep Learning with Limited Numerical Precision

AX-DBN: An Approximate Computing Framework for the Design of Low-Power Discriminative Deep Belief Networks

Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

ADEPNET: A Dynamic-Precision Efficient Posit Multiplier for Neural Networks

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.

Exploring Fault-Energy Trade-offs in Approximate DNN Hardware Accelerators

A comprehensive exploration of approximate DNN models with a novel floating-point simulation framework

Accurate Yet Efficient Stochastic Computing Neural Acceleration with High Precision Residual Fusion.

FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!