Abstract:The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source "EvoApprox" library. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6%. The proposed technique and approximate layers are available as an open-source extension of TensorFlow at <a class="link-external link-https" href="https://github.com/ehw-fit/tf-approximate" rel="external noopener nofollow">this https URL</a>.

W-AMA: Weight-aware Approximate Multiplication Architecture for Neural Processing

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

GQNA: Generic Quantized DNN Accelerator with Weight-Repetition-Aware Activation Aggregating

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!

Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Low Error-Rate Approximate Multiplier Design for DNNs with Hardware-Driven Co-Optimization

NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

Improving the accuracy of neural networks in analog computing-in-memory systems by analog weight.

Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights

WAGONN: Weight Bit Agglomeration in Crossbar Arrays for Reduced Impact of Interconnect Resistance on DNN Inference Accuracy

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

A Data-Driven Asynchronous Neural Network Accelerator

A Charge-Domain Scalable-Weight In-Memory Computing Macro With Dual-SRAM Architecture for Precision-Scalable DNN Accelerators