Abstract:Due to less memory requirement, low computation overhead and negligible accuracy degradation, deep neural networks with binary/ternary weights (BTNNs) have been widely employed on low-power mobile and Internet of Things (IoT) devices with limited storage capacity. Some hardware implementations have been proposed to accelerate the inference of BTNNs by utilizing the multiplication-free feature. However, some implicit characteristics in BTNN convolution, such as high arithmetic complexity and numerous redundant operations, are never considered. In this paper, we propose four optimization techniques to fully exploit these features. First, a feature-integral-based convolution (FIBC) method is proposed to reduce the arithmetic complexity of convolutional layers. Second, a kernel-transformation-feature-reconstruction (KTFR) convolution method is presented to remove redundant operations in BTNN convolution. Third, a hierarchical load-balancing mechanism (HLBM) is designed to eliminate zero value computation and improve resource utilization. Finally, a joint optimization approach for convolutional layers is proposed to search optimal calculation pattern for each layer. Based on the proposed four techniques, we design a reconfigurable processor in a 28-nm CMOS technology to accelerate the inferences of BTNNs. The four proposed techniques improve energy efficiency by 2.07 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.65 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , 1.25 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> , and 2.24 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> for BTNNs respectively, compared with the baseline implementation which disables the proposed techniques. Benchmarked with binary-weight AlexNet, the processor achieves an energy efficiency of 19.9 TOPS/W at 200 MHz and 0.9 V.

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices

Explore Training of Deep Convolutional Neural Networks on Battery-powered Mobile Devices: Design and Application

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

DACO: Pursuing Ultra-low Power Consumption Via DNN-Adaptive CPU-GPU CO-optimization on Mobile Devices

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration

Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

PowerTrain: Fast, Generalizable Time and Power Prediction Models to Optimize DNN Training on Accelerated Edges

NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks

An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

Energy Consumption of Neural Networks on NVIDIA Edge Boards: an Empirical Model