Abstract:The primary operation in DNNs is the dot product of quantized input activations and weights. Prior works have proposed the design of memory-centric architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM (ReRAM) technology is especially appealing for PIM-based DNN accelerators due to its high density to store weights, low leakage energy, low read latency, and high performance capabilities to perform the DNN dot-products massively in parallel within the ReRAM crossbars. However, the main bottleneck of these architectures is the energy-hungry analog-to-digital conversions (ADCs) required to perform analog computations in-ReRAM, which penalizes the efficiency and performance benefits of PIM. To improve energy-efficiency of in-ReRAM analog dot-product computations we present ReDy, a hardware accelerator that implements a ReRAM-centric Dynamic quantization scheme to take advantage of the bit serial streaming and processing of activations. The energy consumption of ReRAM-based DNN accelerators is directly proportional to the numerical precision of the input activations of each DNN layer. In particular, ReDy exploits that activations of CONV layers from Convolutional Neural Networks (CNNs), a subset of DNNs, are commonly grouped according to the size of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes on-the-fly each group of activations with a different numerical precision based on a novel heuristic that takes into account the statistical distribution of each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars and the number of A/D conversions compared to an static 8-bit uniform quantization. We evaluate ReDy on a popular set of modern CNNs. On average, ReDy provides 13\% energy savings over an ISAAC-like accelerator with negligible accuracy loss and area overhead.

Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification

Low-res MobileNet: An efficient lightweight network for low-resolution image classification in resource-constrained scenarios

IremulbNet: Rethinking the inverted residual architecture for image recognition

ResNet Structure Simplification with the Convolutional Kernel Redundancy Measure

Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment

LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction

Memristor-Based MobileNetV3 Circuit Design for Image Classification

Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks

T-RECX: Tiny-Resource Efficient Convolutional neural networks with early-eXit

ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU

SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Features Recombining

A Reliability-Concerned Compute-in-Memory Behavior Model for Convolutional Neural Network

A Novel Low-Communication Energy-Efficient Reconfigurable CNN Acceleration Architecture

Efficient convolutional neural networks on Raspberry Pi for image classification

Rethinking Mobile Block for Efficient Attention-based Models

KRR-CNN: kernels redundancy reduction in convolutional neural networks

Low Bit-Width Convolutional Neural Network on RRAM

ANTNets: Mobile Convolutional Neural Networks for Resource Efficient Image Classification

ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

ShuffleNeMt: modern lightweight convolutional neural network architecture

Binary Convolutional Neural Network on RRAM.