Pim-Prune: Fine-Grain Dcnn Pruning For Crossbar-Based Process-In-Memory Architecture

Chaoqun Chu,Yanzhi Wang,Yilong Zhao,Xiaolong Ma,Shaokai Ye,Yunyan Hong,Xiaoyao Liang,Yinhe Han,Li Jiang
DOI: https://doi.org/10.1109/DAC18072.2020.9218523
2020-01-01
Abstract:Deep Convolution Neural network (DCNN) pruning is an efficient way to reduce the resource and power consumption in a DCNN accelerator. Exploiting the sparsity in the weight matrices of DCNNs, however, is nontrivial if we deploy these DCNNs in a crossbar-based Process-In-Memory (PIM) architecture, because of the crossbar structure. Structural pruning exploiting a coarse-grained sparsity, such as filter/channel-level pruning-can result in a compressed weight matrix that fits the crossbar structure. However, this pruning method inevitably degrades the model accuracy. To solve this problem, in this paper, we propose PIM-PRUNE to exploit the finer-grained sparsity in PIM-architecture, and the resulting compressed weight matrices can significantly reduce the demand of crossbars with negligible accuracy loss.Further, we explore the design space of the crossbar, such as the crossbar size and aspect-ratio, from a new point-of-view of resource-oriented pruning. We find a trade-off existing between the pruning algorithm and the hardware overhead: a PIM with smaller crossbars is more friendly for pruning methods; however, the resulting peripheral circuit cause higher power consumption. Given a specific DCNN, we can suggest a sweet-spot of crossbar design to the optimal overall energy efficiency. Experimental results show that the proposed pruning method applied on Resnet18 can achieve up to 24.85 x and 3.56 x higher compression rate of occupied crossbars on Cifar10 and Imagenet, respectively; while the accuracy loss is negligible, which is 4.56 x and 1.99 x better than the state-of-art methods.
What problem does this paper attempt to address?