Abstract:The primary operation in DNNs is the dot product of quantized input activations and weights. Prior works have proposed the design of memory-centric architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM (ReRAM) technology is especially appealing for PIM-based DNN accelerators due to its high density to store weights, low leakage energy, low read latency, and high performance capabilities to perform the DNN dot-products massively in parallel within the ReRAM crossbars. However, the main bottleneck of these architectures is the energy-hungry analog-to-digital conversions (ADCs) required to perform analog computations in-ReRAM, which penalizes the efficiency and performance benefits of PIM. To improve energy-efficiency of in-ReRAM analog dot-product computations we present ReDy, a hardware accelerator that implements a ReRAM-centric Dynamic quantization scheme to take advantage of the bit serial streaming and processing of activations. The energy consumption of ReRAM-based DNN accelerators is directly proportional to the numerical precision of the input activations of each DNN layer. In particular, ReDy exploits that activations of CONV layers from Convolutional Neural Networks (CNNs), a subset of DNNs, are commonly grouped according to the size of their filters and the size of the ReRAM crossbars. Then, ReDy quantizes on-the-fly each group of activations with a different numerical precision based on a novel heuristic that takes into account the statistical distribution of each group. Overall, ReDy greatly reduces the activity of the ReRAM crossbars and the number of A/D conversions compared to an static 8-bit uniform quantization. We evaluate ReDy on a popular set of modern CNNs. On average, ReDy provides 13\% energy savings over an ISAAC-like accelerator with negligible accuracy loss and area overhead.

SRAM-Based Processing-In-Memory Design with Kullback-Leibler Divergence-Based Dynamic Precision Quantization.

An 8-Bit in Resistive Memory Computing Core with Regulated Passive Neuron and Bitline Weight Mapping

Mitigating RC-Delay Induced Accuracy Loss in Analog In-Memory Computing: A Non-Compromising Approach

APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-level Operations of Quantized Convolutional Neural Networks

An Energy-Efficient Quantized and Regularized Training Framework for Processing-In-Memory Accelerators

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM

PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

Design Framework for SRAM-Based Computing-In-Memory Edge CNN Accelerators

A high-speed reusable quantized hardware accelerator design for CNN on constrained edge device

A 1T2R1C ReRAM CIM Accelerator with Energy-Efficient Voltage Division and Capacitive Coupling for CNN Acceleration in AI Edge Applications.

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration

ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations

A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors

SRAM-Based CIM Architecture Design for Event Detection

Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators