Abstract:Resistive Random-Access-Memory (ReRAM) crossbar is one of the most promising neural network accelerators, thanks to its in-memory and in-situ analog computing abilities for Matrix Multiplication-and-Accumulations (MACs). The key limitations are: 1) the number of rows and columns of ReRAM cells for concurrent execution of MACs is constrained, resulting in limited in-memory computing throughput; 2) the cost of high-precision analog-to-digital (A/D) conversions that can offset the efficiency and performance benefits of ReRAM-based Process-In-Memory (PIM). Meanwhile, it is challenging to deploy Deep Neural Network (DNN) models with a large model size in the crossbar since the sparsity of DNNs cannot be effectively exploited in the crossbar structure, especially the sparsity in the activation. As a countermeasure, we develop a novel ReRAM-based PIM accelerator, namely ERA-BS, which pays attention to the correlation between the bit-level sparsity (in both weights and activations) and the performance of the ReRAM-based crossbar. We propose a superior bit-flip scheme combined with the exponent-based quantization, which can adaptively flip the bits of the mapped DNNs to release redundant space without sacrificing the accuracy much or incurring much hardware overhead. Meanwhile, we design an architecture that can integrate the techniques to shrink the crossbar footprint to be used massively. We further propose a dynamic activation sparsity exploitation scheme in conjunction with the tightly coupled structure nature of the crossbar, including crossbar-aware activation pruning and ancillary run-time hardware support. In such a way, we exploit fine-grained sparsity weights (static) and activations (dynamic), respectively, to improve performance while reducing the energy consumption of computation with negligible overheads. Our experiments on a wide variety of networks show that compared to the well-known ReRAM-based PIM accelerator like “ISAAC”, ERA-BS can achieve up to $43\times$ , $78\times$ , and $73\times$ in terms of energy efficiency, area-efficiency, and throughput, respectively. Compared to the state-of-the-art ReRAM-based design “PIM-Prune”, ERA-BS can also achieve $5.3\times$ energy efficiency, $7.2\times$ area efficiency, and $32\times$ performance gain with a similar or even higher accuracy.

SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator.

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

Bit-Transformer: Transforming Bit-level Sparsity into Higher Preformance in ReRAM-based Accelerator

ERA-BS: Boosting the Efficiency of ReRAM-based PIM Accelerator with Fine-Grained Bit-Level Sparsity

ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator.

A Universal RRAM-Based DNN Accelerator with Programmable Crossbars Beyond MVM Operator

SNrram: an Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory.

Learning the sparsity for ReRAM - mapping and pruning sparse neural network for ReRAM based accelerator.

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

RRAM-DNN: an RRAM and Model-Compression Empowered All-Weights-On-Chip DNN Accelerator

Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

A Reduced Architecture for ReRAM-Based Neural Network Accelerator and Its Software Stack

XB-SIM∗: A Simulation Framework for Modeling and Exploration of ReRAM-based CNN Acceleration Design

PattPIM: A Practical ReRAM-Based DNN Accelerator by Reusing Weight Pattern Repetitions

Boosting ReRAM-based DNN by Row Activation Oversubscription.

Crossbar-Aligned & Integer-Only Neural Network Compression for Efficient In-Memory Acceleration

Re2PIM

ATT: A Fault-Tolerant ReRAM Accelerator for Attention-based Neural Networks

Learning the Sparsity for ReRAM

APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators

3A-Reram: Adaptive Activation Accumulation in ReRAM-Based CNN Accelerator