An Operation-Minimized FPGA Accelerator Design by Dynamically Exploiting Sparsity in CNN Winograd Transform

Xinkai Di,Haigang Yang,Zhihong Huang,Ning Mao
DOI: https://doi.org/10.1109/socc46988.2019.1570558495
2019-01-01
Abstract:To address the challenges of high computational complexity incurred in deep convolutional neural networks (CNNs), implementations by both the Fast Winograd Transform algorithm and the sparsity exploration method have been attempted in order to reduce the hardware operation overhead. Yet, the previous studies have been mainly concentrated on dealing with the fixed sparsity patterns of the weight filter. In this paper, we focus the effort specifically towards exploiting the characteristics of varying sparsity patterns existing in the input/output Activations of the Winograd-transformed network. To this end, a dynamically compressing approach for multiplication with the sparsity-changing matrix is proposed. Such a processing flow features in data indexing and restoring. Because they are dynamically generated during the inference process, the inputs/outputs are highly dependent on the actual data being processed. Unlike the static pattern of a weight matrix just requiring the offline compression, a real-time compression processor module is devised and employed to deal with the dynamic matrix pattern for updating online the inputs/outputs within FPGAs Block RAMs. In the next layer computation, only the valid data needs to be restored by following the necessary index information and broadcasting to those corresponding sparse weight matrices, which in turn generates the next batch inputs/outputs. The design has realized a typical CNN such as VGG on Xilinx Virtex 7 FPGA device for verification and achieves an overall performance of 629.4 GOPS. Meanwhile, the preliminary experimental results demonstrate 2.2 (up to 5.5) times improvement in terms of equivalent GOPS per DSP Block achieved with our adaptive sparsity exploitation approach, when compared to the other conventional counterparts.
What problem does this paper attempt to address?