ERA-BS: Boosting the Efficiency of ReRAM-based PIM Accelerator with Fine-Grained Bit-Level Sparsity
Fangxin Liu,Wenbo Zhao,Zongwu Wang,Yongbiao Chen,Xiaoyao Liang,Li Jiang
DOI: https://doi.org/10.1109/tc.2023.3290869
IF: 3.183
2024-01-01
IEEE Transactions on Computers
Abstract:Resistive Random-Access-Memory (ReRAM) crossbar is one of the most promising neural network accelerators, thanks to its in-memory and in-situ analog computing abilities for Matrix Multiplication-and-Accumulations (MACs). The key limitations are: 1) the number of rows and columns of ReRAM cells for concurrent execution of MACs is constrained, resulting in limited in-memory computing throughput; 2) the cost of high-precision analog-to-digital (A/D) conversions that can offset the efficiency and performance benefits of ReRAM-based Process-In-Memory (PIM). Meanwhile, it is challenging to deploy Deep Neural Network (DNN) models with a large model size in the crossbar since the sparsity of DNNs cannot be effectively exploited in the crossbar structure, especially the sparsity in the activation. As a countermeasure, we develop a novel ReRAM-based PIM accelerator, namely ERA-BS, which pays attention to the correlation between the bit-level sparsity (in both weights and activations) and the performance of the ReRAM-based crossbar. We propose a superior bit-flip scheme combined with the exponent-based quantization, which can adaptively flip the bits of the mapped DNNs to release redundant space without sacrificing the accuracy much or incurring much hardware overhead. Meanwhile, we design an architecture that can integrate the techniques to shrink the crossbar footprint to be used massively. We further propose a dynamic activation sparsity exploitation scheme in conjunction with the tightly coupled structure nature of the crossbar, including crossbar-aware activation pruning and ancillary run-time hardware support. In such a way, we exploit fine-grained sparsity weights (static) and activations (dynamic), respectively, to improve performance while reducing the energy consumption of computation with negligible overheads. Our experiments on a wide variety of networks show that compared to the well-known ReRAM-based PIM accelerator like “ISAAC”, ERA-BS can achieve up to $43\times$ , $78\times$ , and $73\times$ in terms of energy efficiency, area-efficiency, and throughput, respectively. Compared to the state-of-the-art ReRAM-based design “PIM-Prune”, ERA-BS can also achieve $5.3\times$ energy efficiency, $7.2\times$ area efficiency, and $32\times$ performance gain with a similar or even higher accuracy.