Abstract:Resistive Random-Access-Memory (ReRAM) crossbar is one of the most promising neural network accelerators, thanks to its in-memory and in-situ analog computing abilities for Matrix Multiplication-and-Accumulations (MACs). The key limitations are: 1) the number of rows and columns of ReRAM cells for concurrent execution of MACs is constrained, resulting in limited in-memory computing throughput; 2) the cost of high-precision analog-to-digital (A/D) conversions that can offset the efficiency and performance benefits of ReRAM-based Process-In-Memory (PIM). Meanwhile, it is challenging to deploy Deep Neural Network (DNN) models with a large model size in the crossbar since the sparsity of DNNs cannot be effectively exploited in the crossbar structure, especially the sparsity in the activation. As a countermeasure, we develop a novel ReRAM-based PIM accelerator, namely ERA-BS, which pays attention to the correlation between the bit-level sparsity (in both weights and activations) and the performance of the ReRAM-based crossbar. We propose a superior bit-flip scheme combined with the exponent-based quantization, which can adaptively flip the bits of the mapped DNNs to release redundant space without sacrificing the accuracy much or incurring much hardware overhead. Meanwhile, we design an architecture that can integrate the techniques to shrink the crossbar footprint to be used massively. We further propose a dynamic activation sparsity exploitation scheme in conjunction with the tightly coupled structure nature of the crossbar, including crossbar-aware activation pruning and ancillary run-time hardware support. In such a way, we exploit fine-grained sparsity weights (static) and activations (dynamic), respectively, to improve performance while reducing the energy consumption of computation with negligible overheads. Our experiments on a wide variety of networks show that compared to the well-known ReRAM-based PIM accelerator like “ISAAC”, ERA-BS can achieve up to $43\times$ , $78\times$ , and $73\times$ in terms of energy efficiency, area-efficiency, and throughput, respectively. Compared to the state-of-the-art ReRAM-based design “PIM-Prune”, ERA-BS can also achieve $5.3\times$ energy efficiency, $7.2\times$ area efficiency, and $32\times$ performance gain with a similar or even higher accuracy.

PRAP-PIM: A Weight Pattern Reusing Aware Pruning Method for ReRAM-based PIM DNN Accelerators

A design framework for processing-in-memory accelerator

APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators

Re2PIM

ERA-BS: Boosting the Efficiency of ReRAM-based PIM Accelerator with Fine-Grained Bit-Level Sparsity

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

SEAL-lab Technical Report – No . 2015-001 ( April 29 , 2016 ) Processing-in-Memory in ReRAM-based Main Memory

ReMap: Reorder Mapping for Multi-level Uneven Distribution on Sparse ReRAM Accelerator.

PIMPR: PIM-based Personalized Recommendation with Heterogeneous Memory Hierarchy

SEAL-lab Technical Report – No . 2015-001 ( November 30 , 2015 ) Processing-in-Memory in ReRAM-based Main Memory

AUTO-PRUNE

NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators with 3D-Stacked-DRAM

ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator.

Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity

Parapim: A Parallel Processing-In-Memory Accelerator For Binary-Weight Deep Neural Networks

High Area/Energy Efficiency RRAM CNN Accelerator with Pattern-Pruning-Based Weight Mapping Scheme

ReHy: A ReRAM-based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

VSPIM: SRAM Processing-in-Memory DNN Acceleration via Vector-Scalar Operations

PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory.