Abstract:Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference. We present Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. The tool fine-tunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.

ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU

MLCNN: Cross-Layer Cooperative Optimization and Accelerator Architecture for Speeding Up Deep Learning Applications

Deep Neural Network Acceleration with Sparse Prediction Layers

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

High performance ultra-low-precision convolutions on mobile devices

Sensitivity-based Acceleration and Compression Algorithm for Convolution Neural Network.

Accelerating Convolutional Neural Networks for Continuous Mobile Vision Via Cache Reuse.

Quantized Convolutional Neural Networks for Mobile Devices

FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks

Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal Balance

Efficient Integer-Arithmetic-Only Convolutional Neural Networks

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification

PCONV: the Missing but Desirable Sparsity in DNN Weight Pruning for Real-Time Execution on Mobile Devices.

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

High Performance Depthwise and Pointwise Convolutions on Mobile Devices

A Fine-Grained End-to-End Latency Optimization Framework for Wireless Collaborative Inference

Efficient Integer-Arithmetic-Only Convolutional Networks with Bounded ReLU

Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs

ABM-SpConv-SIMD: Accelerating Convolutional Neural Network Inference for Industrial IoT Applications on Edge Devices