BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks

Amro Eldebiky,Grace Li Zhang,Xunzhao Yin,Cheng Zhuo,Ing-Chao Lin,Ulf Schlichtmann,Bing Li
2024-07-04
Abstract:Deep neural networks (DNNs) have made breakthroughs in various fields including image recognition and language processing. DNNs execute hundreds of millions of multiply-and-accumulate (MAC) operations. To efficiently accelerate such computations, analog in-memory-computing platforms have emerged leveraging emerging devices such as resistive RAM (RRAM). However, such accelerators face the hurdle of being required to have sufficient on-chip crossbars to hold all the weights of a DNN. Otherwise, RRAM cells in the crossbars need to be reprogramed to process further layers, which causes huge time/energy overhead due to the extremely slow writing and verification of the RRAM cells. As a result, it is still not possible to deploy such accelerators to process large-scale DNNs in industry. To address this problem, we propose the BasisN framework to accelerate DNNs on any number of available crossbars without reprogramming. BasisN introduces a novel representation of the kernels in DNN layers as combinations of global basis vectors shared between all layers with quantized coefficients. These basis vectors are written to crossbars only once and used for the computations of all layers with marginal hardware modification. BasisN also provides a novel training approach to enhance computation parallelization with the global basis vectors and optimize the coefficients to construct the kernels. Experimental results demonstrate that cycles per inference and energy-delay product were reduced to below 1% compared with applying reprogramming on crossbars in processing large-scale DNNs such as DenseNet and ResNet on ImageNet and CIFAR100 datasets, while the training and hardware costs are negligible.
Systems and Control,Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve a key problem encountered when using resistive random - access memory (RRAM) - based in - memory - computing (IMC) accelerators to process large - scale deep neural networks (DNNs): **the excessive time and energy costs of reprogramming RRAM crossbars**. Specifically, existing RRAM - based IMC accelerators need to store all weights in a limited number of RRAM crossbars when executing DNNs. When these crossbars are insufficient to store all weights, reprogramming of RRAM cells is required to process subsequent layers, which leads to huge time and energy costs. For example, reprogramming a 128×128 RRAM crossbar requires 10^4 to 10^5 cycles, which significantly slows down the operation speed of the entire chip. In addition, although existing compression techniques can reduce the number of required crossbars, they still cannot completely avoid the need for reprogramming. For example, for a large - scale DNN such as DenseNet - ImageNet, even using the most advanced compression techniques, an extremely high compression ratio (less than 0.1) is required, and existing compression methods cannot meet this requirement. ### Solutions To overcome the above problems, the paper proposes the **BasisN framework**, whose core idea is to represent the convolution kernels of each layer of the DNN through the combination of global basis vectors, thereby avoiding reprogramming. The specific contributions are as follows: 1. **New kernel representation method**: BasisN represents all convolution kernels of each layer of the DNN as a linear combination of a set of global basis vectors. These basis vectors only need to be written once and can be used for calculations of all layers. 2. **Training framework**: BasisN provides a new training method, enabling the weight matrix of the DNN to be represented as a combination of basis vectors and optimizing the combination coefficients while maintaining low hardware costs. 3. **Efficient computation**: BasisN can run on any number of available crossbars without reprogramming, significantly reducing the number of inference cycles and the energy - delay product (EDP) required. ### Experimental results The experimental results show that, compared with existing reprogramming methods, the BasisN framework reduces the inference cycles and energy - delay product by less than 1% respectively when processing large - scale DNNs (such as DenseNet and ResNet), without a decrease in inference accuracy and with negligible hardware costs. ### Summary The BasisN framework effectively solves the reprogramming problem of RRAM - based IMC accelerators when processing large - scale DNNs by introducing the combined representation method of global basis vectors, significantly improving computational efficiency and energy efficiency.