Abstract:Due to less memory requirement, low computation overhead and negligible accuracy degradation, deep neural networks with binary/ternary weights (BTNNs) have been widely employed on low-power mobile and Internet of Things (IoT) devices with limited storage capacity. Some hardware implementations have been proposed to accelerate the inference of BTNNs by utilizing the multiplication-free feature. However, some implicit characteristics in BTNN convolution, such as high arithmetic complexity and numerous redundant operations, are never considered. In this paper, we propose four optimization techniques to fully exploit these features. First, a feature-integral-based convolution (FIBC) method is proposed to reduce the arithmetic complexity of convolutional layers. Second, a kernel-transformation-feature-reconstruction (KTFR) convolution method is presented to remove redundant operations in BTNN convolution. Third, a hierarchical load-balancing mechanism (HLBM) is designed to eliminate zero value computation and improve resource utilization. Finally, a joint optimization approach for convolutional layers is proposed to search optimal calculation pattern for each layer. Based on the proposed four techniques, we design a reconfigurable processor in a 28-nm CMOS technology to accelerate the inferences of BTNNs. The four proposed techniques improve energy efficiency by 2.07 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.65 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.25 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , and 2.24 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> for BTNNs respectively, compared with the baseline implementation which disables the proposed techniques. Benchmarked with binary-weight AlexNet, the processor achieves an energy efficiency of 19.9 TOPS/W at 200 MHz and 0.9 V.

Quality Driven Systematic Approximation for Binary-Weight Neural Network Deployment

An Efficient BCNN Deployment Method Using Quality-Aware Approximate Computing.

Background Noise Adaptive Energy-Efficient Keywords Recognition Processor with Reusable DNN and Reconfigurable Architecture

An Ultra-low Power Keyword-Spotting Accelerator Using Circuit-Architecture-System Co-design and Self-adaptive Approximate Computing Based BWN

A Time-Domain Binary CNN Engine with Error-Detection-Based Resilience in 28nm CMOS

QCNN Inspired Reconfigurable Keyword Spotting Processor with Hybrid Data-Weight Reuse Methods

Binarized Weight Neural-Network Inspired Ultra-Low Power Speech Recognition Processor with Time-Domain Based Digital-Analog Mixed Approximate Computing

Low Power Keyword Recognition Accelerator based on Approximate Calculation of Deep-Shift Neural Network

A 22nm, 10.8 μ W/15.1 μ W Dual Computing Modes High Power-Performance-Area Efficiency Domained Background Noise Aware Keyword- Spotting Processor

An Ultra-Low Power Always-On Keyword Spotting Accelerator Using Quantized Convolutional Neural Network and Voltage-Domain Analog Switching Network-Based Approximate Computing

A 1D-CRNN Inspired Reconfigurable Processor for Noise-robust Low-power Keywords Recognition

An Energy-efficient Reconfigurable Hybrid DNN Architecture for Speech Recognition with Approximate Computing.

Low-complex and Highly-performed Binary Residual Neural Network for Small-footprint Keyword Spotting

A Low Power DNN-based Speech Recognition Processor with Precision Recoverable Approximate Computing

An Energy-Efficient Time-Domain Binary Neural Network Accelerator with Error-Detection in 28nm CMOS

A 22nm, 10.8 <italic>μ</italic> W/15.1 <italic>μ</italic> W Dual Computing Modes High Power-Performance-Area Efficiency Domained Background Noise Aware Keyword- Spotting Processor

A 3.8-Μw 10-Keyword Noise-Robust Keyword Spotting Processor Using Symmetric Compressed Ternary-Weight Neural Networks

An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks

Low-Power Computing Unit based on Heterogeneous Approximate Structure for Binary Convolutional Neural Network

EERA-KWS: A 163 TOPS/W Always-on Keyword Spotting Accelerator in 28nm CMOS Using Binary Weight Network and Precision Self-Adaptive Approximate Computing.

An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width