Sense: Model-Hardware Codesign for Accelerating Sparse CNNs on Systolic Arrays

Wenhao Sun,Deng Liu,Zhiwei Zou,Wendi Sun,Song Chen,Yi Kang
DOI: https://doi.org/10.1109/tvlsi.2023.3241933
2023-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Sparsity is an intrinsic property of convolutional neural networks (CNNs), worth exploiting for CNN accelerators. However, the extra processing involved comes with hardware overhead, resulting in only marginal profits for most architectures. Meanwhile, systolic arrays have become increasingly competitive on CNN acceleration for its high spatiotemporal locality and low hardware overhead. However, the irregularity of sparsity induces imbalanced workloads under the rigid systolic dataflow, causing performance degradation. Thus, this article proposed a systolic-array-based architecture, called Sense, for sparse CNN acceleration by model-hardware codesign, enabling large performance gains. To balance input feature map (IFM) and weight loads across the processing element (PE) array, we applied channel clustering to gather IFMs with approximate sparsity for array computation and codesigned a load-balancing weight pruning method to keep the sparsity ratio of each kernel at a certain value with little accuracy loss, improving PE utilization and overall performance. In addition, adaptive dataflow configuration was applied to determine the computing strategy based on the storage ratio of IFMs and weights, lowering $1.17\times $ – $1.8\times $ dynamic random access memory (DRAM) access compared with Swallow and further reducing system energy consumption. The whole design was implemented on ZynqZCU102 with 200 MHz and performs at 471, 34, 53, and 191 image/s for AlexNet, VGG-16, ResNet-50, and GoogleNet, respectively. Compared with sparse systolic-array-based accelerators, Swallow, fusion-enabled systolic architecture (FESA), and SPOTS, Sense achieves $0.97\times $ – $2.18\times $ , $1.3\times $ – $1.67\times $ , and $0.94\times $ – $1.82\times $ energy efficiency (image/J) on these CNNs, respectively.
What problem does this paper attempt to address?