WRA-SS: A High-Performance Accelerator Integrating Winograd with Structured Sparsity for Convolutional Neural Networks

Chen Yang,Yishuo Meng,Jiawei Xi,Siwei Xiang,Jianfei Wang,Kuizhi Mei
DOI: https://doi.org/10.1109/tvlsi.2023.3330993
2024-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Sparsification for convolutional neural networks (CNNs) and convolution acceleration algorithms such as the Winograd algorithm are two efficient ways to reduce the intensive computations of existing CNNs. To better combine the sparsification and Winograd algorithm, a close integration method is proposed to dynamically reduce the invalid parameters following the Winograd transformation. To address the limitation of data bandwidth, a hierarchical two-level storage structure and corresponding data scheduling scheme are proposed, which can realize a conflict-free scheduling process. In addition, an algorithm-hardware codesign method is proposed to efficiently and flexibly reduce the invalid computations led by the previous filter decomposition method. The accelerator is evaluated on Xilinx XCVU9P FPGA, reaching 412-MHz clock frequency. Compared to state-of-the-art designs, WRA-SS can achieve 1.54– $5.33\times $ and 1.17– $7.39\times $ performance improvement for VGG-16 under 80% weight sparsity and 0% weight sparsity, respectively.
What problem does this paper attempt to address?