Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration

Reena Elangovan,Shubham Jain,Anand Raghunathan
DOI: https://doi.org/10.48550/arXiv.2011.13000
2021-10-29
Abstract:Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network, requiring support for variable precision in DNN hardware. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision. To efficiently support precision re-configurability in DNN accelerators, we introduce an approximate computing method wherein DNN computations are performed block-wise (a block is a group of bits) and re-configurability is supported at the granularity of blocks. Results of block-wise computations are composed in an approximate manner to enable efficient re-configurability. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for a given DNN. By varying the approximation configurations across DNNs, we achieve 1.17x-1.73x and 1.02x-2.04x improvement in system energy and performance respectively, over an 8-bit fixed-point (FxP8) baseline, with negligible loss in classification accuracy. Further, by varying the approximation configurations across layers and data-structures within DNNs, we achieve 1.25x-2.42x and 1.07x-2.95x improvement in system energy and performance respectively, with negligible accuracy loss.
Machine Learning,Hardware Architecture,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How to efficiently support variable - precision computing in deep neural network (DNN) hardware accelerators to achieve higher energy efficiency and performance while maintaining relatively low precision loss**. Specifically, the paper focuses on: 1. **The need for reduced precision**: In order to improve the energy efficiency of DNN inference, using low - precision (sub - 8 - bit) computing is a popular technique. However, the minimum precision required for different networks, different layers, and even different data structures varies greatly, which requires that the hardware be able to support variable - precision computing. 2. **Limitations of existing methods**: Existing variable - precision hardware (such as bit - serial architectures) can achieve variable - precision computing, but it will bring high energy and latency overheads, thereby weakening the advantages brought by low - precision. 3. **Proposing the Ax - BxP method**: To solve the above problems, the paper proposes Ax - BxP (Approximate Blocked Computation), an approximate blocked - computing method. This method introduces approximation by performing multiply - accumulate operations in blocks and only performing some of the required block - level computations, thereby achieving efficient variable - precision computing. ### Main features of Ax - BxP: - **Block - level computing**: Divide weights and activation values into fixed - length blocks, each block containing multiple bits. - **Approximate computing**: Introduce approximation by only performing some block - level computations, thereby achieving an efficient variable - precision configuration. - **Hardware design**: Propose an architectural enhancement of the DNN accelerator based on the standard systolic array to support Ax - BxP computing. ### Experimental results: For DNN models such as AlexNet, ResNet50, and MobileNetV2, the Ax - BxP method achieved improvements of 1.1x - 1.74x and 1.02x - 2x in system energy consumption and performance respectively, and the loss in classification accuracy was very small (less than 1% on average). In addition, by more finely adjusting the approximate configuration in different layers and data structures of the DNN, the system energy consumption and performance were further improved (improvements of 1.12x - 2.23x and 1.14x - 2.34x respectively). ### Summary: By proposing the Ax - BxP method, the paper solves the problem of efficiently supporting variable - precision computing in DNN hardware accelerators, significantly improving energy efficiency and performance while maintaining relatively low precision loss.