Abstract:The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source "EvoApprox" library. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6%. The proposed technique and approximate layers are available as an open-source extension of TensorFlow at <a class="link-external link-https" href="https://github.com/ehw-fit/tf-approximate" rel="external noopener nofollow">this https URL</a>.

Timing Error Tolerant CNN Accelerator with Layer-Wise Approximate Multiplication

MLCNN: Cross-Layer Cooperative Optimization and Accelerator Architecture for Speeding Up Deep Learning Applications

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

A Reconfigurable Approximate Multiplier for Quantized CNN Applications.

Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators

A Low-Power DNN Accelerator with Mean-Error-Minimized Approximate Signed Multiplier

Work-in-Process: Error-Compensation-Based Energy-Efficient MAC Unit for CNNs

A Precision-Scalable Energy-Efficient Convolutional Neural Network Accelerator.

An Energy-Efficient Time-Domain Binary Neural Network Accelerator with Error-Detection in 28nm CMOS

ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

Mutual Error Compensation Based Area and Power Efficient Approximate Multiplier

Self-compensation Tensor Multiplication Unit for Adaptive Approximate Computing in Low-Power CNN Processing

ARA: Cross-Layer Approximate Computing Framework Based Reconfigurable Architecture for CNNs

Optimally Approximated and Unbiased Floating-Point Multiplier with Runtime Configurability

An Efficient and Reliable Negative Margin Timing Error Detection for Neural Network Accelerator Without Accuracy Loss in 28nm CMOS

An Energy-Efficient Multiplier Using Hybrid Approximate Logic Synthesis for Mixed-Quantization CNNs

Layer-Wise Mixed-Modes CNN Processing Architecture With Double-Stationary Dataflow and Dimension-Reshape Strategy

A CGP-based Efficient Approximate Multiplier with Error Compensation

A Time-Domain Binary CNN Engine with Error-Detection-Based Resilience in 28nm CMOS

Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs

Improving the Performance of CNN Accelerator Architecture under the Impact of Process Variations