PQ-PIM: A Pruning–quantization Joint Optimization Framework for ReRAM-based Processing-in-memory DNN Accelerator

Yuhao Zhang,Xinyu Wang,Xikun Jiang,Yuhan Yang,Zhaoyan Shen,Zhiping Jia
DOI: https://doi.org/10.1016/j.sysarc.2022.102531
IF: 5.836
2022-01-01
Journal of Systems Architecture
Abstract:Pruning and quantization are two efficient techniques to achieve performance improvement and energy saving for ReRAM-based DNN accelerators. However, most existing ReRAM-based DNN accelerators using pruning and quantization are based on an overidealized multi-bit ReRAM crossbar while neglecting the practical structure constraints. Due to the restriction of immature process technology, the actual matrix–vector multiplication must be conducted in a smaller operation unit (OU) granularity with single bit ReRAM cells. In this paper, we propose an efficient pruning–quantization joint exploration framework for practical ReRAM-based DNN accelerator, termed as PQ-PIM, which consists of a patch-wise pruning–quantization algorithm based on patch importance analysis to compress DNN models and a configurable mixed OU-based single bit ReRAM DNN engine to enable the algorithm with better performance and energy efficiency. Experimental results show that PQ-PIM achieves up to 1.74× performance improvement, 62% energy saving, and 5.84× compression ratio of occupied crossbars, compared to the state-of-the-art ReRAM-based DNN accelerators.
What problem does this paper attempt to address?