Abstract:Deep neural networks (DNNs) have made breakthroughs in various fields including image recognition and language processing. DNNs execute hundreds of millions of multiply-and-accumulate (MAC) operations. To efficiently accelerate such computations, analog in-memory-computing platforms have emerged leveraging emerging devices such as resistive RAM (RRAM). However, such accelerators face the hurdle of being required to have sufficient on-chip crossbars to hold all the weights of a DNN. Otherwise, RRAM cells in the crossbars need to be reprogramed to process further layers, which causes huge time/energy overhead due to the extremely slow writing and verification of the RRAM cells. As a result, it is still not possible to deploy such accelerators to process large-scale DNNs in industry. To address this problem, we propose the BasisN framework to accelerate DNNs on any number of available crossbars without reprogramming. BasisN introduces a novel representation of the kernels in DNN layers as combinations of global basis vectors shared between all layers with quantized coefficients. These basis vectors are written to crossbars only once and used for the computations of all layers with marginal hardware modification. BasisN also provides a novel training approach to enhance computation parallelization with the global basis vectors and optimize the coefficients to construct the kernels. Experimental results demonstrate that cycles per inference and energy-delay product were reduced to below 1% compared with applying reprogramming on crossbars in processing large-scale DNNs such as DenseNet and ResNet on ImageNet and CIFAR100 datasets, while the training and hardware costs are negligible.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve a key problem encountered when using resistive random - access memory (RRAM) - based in - memory - computing (IMC) accelerators to process large - scale deep neural networks (DNNs): **the excessive time and energy costs of reprogramming RRAM crossbars**. Specifically, existing RRAM - based IMC accelerators need to store all weights in a limited number of RRAM crossbars when executing DNNs. When these crossbars are insufficient to store all weights, reprogramming of RRAM cells is required to process subsequent layers, which leads to huge time and energy costs. For example, reprogramming a 128×128 RRAM crossbar requires 10^4 to 10^5 cycles, which significantly slows down the operation speed of the entire chip. In addition, although existing compression techniques can reduce the number of required crossbars, they still cannot completely avoid the need for reprogramming. For example, for a large - scale DNN such as DenseNet - ImageNet, even using the most advanced compression techniques, an extremely high compression ratio (less than 0.1) is required, and existing compression methods cannot meet this requirement. ### Solutions To overcome the above problems, the paper proposes the **BasisN framework**, whose core idea is to represent the convolution kernels of each layer of the DNN through the combination of global basis vectors, thereby avoiding reprogramming. The specific contributions are as follows: 1. **New kernel representation method**: BasisN represents all convolution kernels of each layer of the DNN as a linear combination of a set of global basis vectors. These basis vectors only need to be written once and can be used for calculations of all layers. 2. **Training framework**: BasisN provides a new training method, enabling the weight matrix of the DNN to be represented as a combination of basis vectors and optimizing the combination coefficients while maintaining low hardware costs. 3. **Efficient computation**: BasisN can run on any number of available crossbars without reprogramming, significantly reducing the number of inference cycles and the energy - delay product (EDP) required. ### Experimental results The experimental results show that, compared with existing reprogramming methods, the BasisN framework reduces the inference cycles and energy - delay product by less than 1% respectively when processing large - scale DNNs (such as DenseNet and ResNet), without a decrease in inference accuracy and with negligible hardware costs. ### Summary The BasisN framework effectively solves the reprogramming problem of RRAM - based IMC accelerators when processing large - scale DNNs by introducing the combined representation method of global basis vectors, significantly improving computational efficiency and energy efficiency.

BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

A Universal RRAM-Based DNN Accelerator with Programmable Crossbars Beyond MVM Operator

A Low-Latency DNN Accelerator Enabled by DFT-Based Convolution Execution Within Crossbar Arrays

Bit-Transformer: Transforming Bit-level Sparsity into Higher Preformance in ReRAM-based Accelerator

SoBS-X: Squeeze-Out Bit Sparsity for ReRAM-Crossbar-Based Neural Network Accelerator.

Enabling Resistive-RAM-based Activation Functions for Deep Neural Network Acceleration

RRAM-DNN: an RRAM and Model-Compression Empowered All-Weights-On-Chip DNN Accelerator

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS

ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator.

Hybrid RRAM/SRAM in-Memory Computing for Robust DNN Acceleration

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Re2PIM

Low Bit-Width Convolutional Neural Network on RRAM

ERA-BS: Boosting the Efficiency of ReRAM-based PIM Accelerator with Fine-Grained Bit-Level Sparsity

RRAM based learning acceleration.

CREAM: Computing in ReRAM-Assisted Energy- and Area-Efficient SRAM for Reliable Neural Network Acceleration.

LayCO: Achieving Least Lossy Accuracy for Most Efficient RRAM-Based Deep Neural Network Accelerator via Layer-Centric Co-Optimization

Parapim: A Parallel Processing-In-Memory Accelerator For Binary-Weight Deep Neural Networks

SNrram: an Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory.

RIMAC: an Array-Level ADC/DAC-Free ReRAM-Based In-Memory DNN Processor with Analog Cache and Computation.