Abstract:Resistive random-access-memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for vector–matrix multiplication-and-accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It is inevitably complex and costly to exploit fine-grained sparsity due to the limitation of the tightly coupled crossbar structure. As a countermeasure, we develop a novel ReRAM-based DNN accelerator, named sparse-multiplication-engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Such quantized weights can be nicely generated using the alternating direction method of multipliers (ADMM) optimization during the DNN fine-tuning, which can exactly enforce bit patterns in weights. Second, we propose a novel weight mapping mechanism to slice the bits of the weight across crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly sparse nonzeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. We further propose a workload grouping algorithm and a pipeline to achieve workload balance among crossbar-rows that concurrently execute multiply–accumulate operations to optimize the system latency. Putting all together, with the optimized model, compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to $8.7\times $ and $2.1\times $ using ResNet-50 and MobileNet-v2, respectively, and achieve average $3.1\times $ speed up with no or little accuracy loss on ImageNet.

Sparsity-Aware Optimization of In-Memory Bayesian Binary Neural Network Accelerators

Bayes2IMC: In-Memory Computing for Bayesian Binary Neural Networks

An Efficient Channel-Aware Sparse Binarized Neural Networks Inference Accelerator

Bayesian Inference Accelerator for Spiking Neural Networks

Exploiting Near-Memory Processing Architectures for Bayesian Neural Networks Acceleration

BinSparX: Sparsified Binary Neural Networks for Reduced Hardware Non-Idealities in Xbar Arrays

Optimizing BCPNN Learning Rule for Memory Access

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

An Energy-Efficient Bagged Binary Neural Network Accelerator

Energy-Efficient Machine Learning Accelerator for Binary Neural Networks

Crossbar-Aligned & Integer-Only Neural Network Compression for Efficient In-Memory Acceleration

PULSE: Parametric Hardware Units for Low-power Sparsity-Aware Convolution Engine

Design Space Exploration of Sparsity-Aware Application-Specific Spiking Neural Network Accelerators

An Approach of Binary Neural Network Energy-Efficient Implementation

Cerebron: A Reconfigurable Architecture for Spatiotemporal Sparse Spiking Neural Networks

SparseNN: an Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity

Cambricon-S: Addressing Irregularity in Sparse Neural Networks Through A Cooperative Software/Hardware Approach.

CiM-BNN:Computing-in-MRAM Architecture for Stochastic Computing Based Bayesian Neural Network

A Precision-Scalable Deep Neural Network Accelerator with Activation Sparsity Exploitation

Efficient Computation Reduction in Bayesian Neural Networks Through Feature Decomposition and Memorization

SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator.