Abstract:Due to less memory requirement, low computation overhead and negligible accuracy degradation, deep neural networks with binary/ternary weights (BTNNs) have been widely employed on low-power mobile and Internet of Things (IoT) devices with limited storage capacity. Some hardware implementations have been proposed to accelerate the inference of BTNNs by utilizing the multiplication-free feature. However, some implicit characteristics in BTNN convolution, such as high arithmetic complexity and numerous redundant operations, are never considered. In this paper, we propose four optimization techniques to fully exploit these features. First, a feature-integral-based convolution (FIBC) method is proposed to reduce the arithmetic complexity of convolutional layers. Second, a kernel-transformation-feature-reconstruction (KTFR) convolution method is presented to remove redundant operations in BTNN convolution. Third, a hierarchical load-balancing mechanism (HLBM) is designed to eliminate zero value computation and improve resource utilization. Finally, a joint optimization approach for convolutional layers is proposed to search optimal calculation pattern for each layer. Based on the proposed four techniques, we design a reconfigurable processor in a 28-nm CMOS technology to accelerate the inferences of BTNNs. The four proposed techniques improve energy efficiency by 2.07 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.65 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.25 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , and 2.24 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> for BTNNs respectively, compared with the baseline implementation which disables the proposed techniques. Benchmarked with binary-weight AlexNet, the processor achieves an energy efficiency of 19.9 TOPS/W at 200 MHz and 0.9 V.

An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights

An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width

An Approach of Binary Neural Network Energy-Efficient Implementation

Binary Convolutional Neural Network on RRAM.

An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference.

A Proposal for Energy-Efficient Cellular Neural Network based on Spintronic Devices

Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

24.4 Sandwich-RAM: an Energy-Efficient In-Memory BWN Architecture with Pulse-Width Modulation

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS

A Multilevel Cell STT-MRAM-Based Computing In-Memory Accelerator for Binary Convolutional Neural Network

BR-CIM: an Efficient Binary Representation Computation-In-Memory Design

XNORAM: an Efficient Computing-in-Memory Architecture for Binary Convolutional Neural Networks with Flexible Dataflow Mapping.

Exploiting Kernel Compression on BNNs

AICNN: Implementing Typical CNN Algorithms with Analog-to-Information Conversion Architecture

A 68 mw 2.2 Tops/w low bit-width and multiplierless DCNN object detection processor for visually impaired people

Binary Neural Networks as a general-propose compute paradigm for on-device computer vision

Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory