Abstract:Resistive random-access-memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for vector–matrix multiplication-and-accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It is inevitably complex and costly to exploit fine-grained sparsity due to the limitation of the tightly coupled crossbar structure. As a countermeasure, we develop a novel ReRAM-based DNN accelerator, named sparse-multiplication-engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Such quantized weights can be nicely generated using the alternating direction method of multipliers (ADMM) optimization during the DNN fine-tuning, which can exactly enforce bit patterns in weights. Second, we propose a novel weight mapping mechanism to slice the bits of the weight across crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly sparse nonzeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. We further propose a workload grouping algorithm and a pipeline to achieve workload balance among crossbar-rows that concurrently execute multiply–accumulate operations to optimize the system latency. Putting all together, with the optimized model, compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to $8.7\times $ and $2.1\times $ using ResNet-50 and MobileNet-v2, respectively, and achieve average $3.1\times $ speed up with no or little accuracy loss on ImageNet.

ESNreram: an Energy-Efficient Sparse Neural Network Based on Resistive Random-Access Memory.

SNrram: an Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory.

BitSNNs: Revisiting Energy-efficient Spiking Neural Networks

Learning the Sparsity for ReRAM

Learning the sparsity for ReRAM - mapping and pruning sparse neural network for ReRAM based accelerator.

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

CREAM: Computing in ReRAM-Assisted Energy- and Area-Efficient SRAM for Reliable Neural Network Acceleration.

Twofold Sparsity: Joint Bit- and Network-Level Sparsity for Energy-Efficient Deep Neural Network Using RRAM Based Compute-In-Memory

ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator.

Bit-Transformer: Transforming Bit-level Sparsity into Higher Preformance in ReRAM-based Accelerator

ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

Energy Efficient RRAM Spiking Neural Network for Real Time Classification

An Energy-Efficient Mixed-Bit CNN Accelerator With Column Parallel Readout for ReRAM-Based In-Memory Computing

SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator.

Towards ReRAM-based Accelerator:An Energy-efficient NN Model Compression Framework

RRAM-DNN: an RRAM and Model-Compression Empowered All-Weights-On-Chip DNN Accelerator

TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks

ReCom: an Efficient Resistive Accelerator for Compressed Deep Neural Networks.

Multicore Spiking Neuromorphic Chip in 180-nm with ReRAM Synapses and Digital Neurons

EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural Network Inference considering Approximate DRAMs for Embedded Systems

Spiking Neural Network with RRAM: Can We Use It for Real-World Application?