SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration
Fengbin Tu,Yiqi Wang,Ling Liang,Yufei Ding,Leibo Liu,Shaojun Wei,Shouyi Yin,Yuan Xie
DOI: https://doi.org/10.1109/tcad.2022.3172600
IF: 2.9
2023-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Processing-in-memory (PIM) is a promising architecture for neural network (NN) acceleration. Most previous PIMs are based on analog computing, so their accuracy and memory cell array utilization are limited by analog deviation and ADC overhead. Digital PIM is an emerging type of PIM architecture that integrates digital logic in memory cells, which can make full utilization of the cell array without accuracy loss. However, digital PIM’s rigid crossbar architecture and full array activation raise new challenges in sparse NN acceleration. Conventional unstructured or structured sparsity cannot perform well on both the weight and input side of digital PIM. We take the opportunities from digital PIM’s bit-serial processing and in-memory customization, to tackle the above challenges by the co-designing sparse algorithm, multiplication dataflow, and PIM architecture. At the algorithm level, we propose double-broadcast hybrid-grained pruning to exploit weight sparsity with better accuracy and efficiency balance. At the dataflow level, we propose a bit-serial Booth in-SRAM multiplication dataflow for stable acceleration from the input side. At the architecture level, we design a sparse digital PIM (SDP) accelerator with customized SRAM-PIM macros to support the proposed techniques. SDP achieves $3.59\times $ , $8.15\times $ , $3.11\times $ area efficiency, and $6.95\times $ , $29.44\times $ , $39.40\times $ energy savings, over state-of-the-art sparse NN architectures SIGMA, SRE, and Bit Prudent.