Abstract:Due to less memory requirement, low computation overhead and negligible accuracy degradation, deep neural networks with binary/ternary weights (BTNNs) have been widely employed on low-power mobile and Internet of Things (IoT) devices with limited storage capacity. Some hardware implementations have been proposed to accelerate the inference of BTNNs by utilizing the multiplication-free feature. However, some implicit characteristics in BTNN convolution, such as high arithmetic complexity and numerous redundant operations, are never considered. In this paper, we propose four optimization techniques to fully exploit these features. First, a feature-integral-based convolution (FIBC) method is proposed to reduce the arithmetic complexity of convolutional layers. Second, a kernel-transformation-feature-reconstruction (KTFR) convolution method is presented to remove redundant operations in BTNN convolution. Third, a hierarchical load-balancing mechanism (HLBM) is designed to eliminate zero value computation and improve resource utilization. Finally, a joint optimization approach for convolutional layers is proposed to search optimal calculation pattern for each layer. Based on the proposed four techniques, we design a reconfigurable processor in a 28-nm CMOS technology to accelerate the inferences of BTNNs. The four proposed techniques improve energy efficiency by 2.07 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.65 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.25 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , and 2.24 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> for BTNNs respectively, compared with the baseline implementation which disables the proposed techniques. Benchmarked with binary-weight AlexNet, the processor achieves an energy efficiency of 19.9 TOPS/W at 200 MHz and 0.9 V.

A 617-TOPS/W All-Digital Binary Neural Network Accelerator in 10-nm FinFET CMOS

XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference

7.5 A 65nm 0.39-to-140.3 TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1× Higher TOPS/mm 2 and 6T HBST …

An Approach of Binary Neural Network Energy-Efficient Implementation

An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width

An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS

7.5 A 65nm 0.39-to-140.3tops/w 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Utilizing Dual-Port FeFETs for Energy-Efficient Binary Neural Network Inference Accelerators

Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme

NEXUS: A 28nm 3.3pJ/SOP 16-Core Spiking Neural Network with a Diamond Topology for Real-Time Data Processing

MorphBungee: A 65-nm 7.2-mm2 27-μJ/image Digital Edge Neuromorphic Chip with On-Chip 802-frame/s Multi-Layer Spiking Neural Network Learning

A 1.06-to-5.09 TOPS/W Reconfigurable Hybrid-Neural-Network Processor for Deep Learning Applications

SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception

Binary Neural Network with 16 Mb Rram Macro Chip for Classification and Online Training

TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology

A 28nm Configurable Asynchronous SNN Accelerator with Energy-Efficient Learning

A 28-Nm 198.9-TOPS/W Fault-Tolerant Stochastic Computing Neural Network Processor

A Low-Power Spiking Neural Network Chip Based on a Compact LIF Neuron and Binary Exponential Charge Injector Synapse Circuits

ROBIN: A Robust Optical Binary Neural Network Accelerator

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator