USCA: A Unified Systolic Convolution Array Architecture for Accelerating Sparse Neural Network

Wenjian Liu,Jun Lin,Zhongfeng Wang
DOI: https://doi.org/10.1109/iscas.2019.8702132
2019-01-01
Abstract:Due to the intensive computational complexity and various types of convolution, it is a challange to implement different CNN models on a specific hardware. Many previous works focus on data reuse and sparsity exploration to accelerate computation but fail to support various types of convolution efficiently. When dealing with variants of conventional convolution, such as deconvolution or dilated convolution, previous accelerators waste time on padding zeroes and convolving with padded feature maps. In this paper, we propose a unified convolution algorithm to intelligently combine several convolution types together and exploit the sparsity in activations. The padding process can be skipped by the proposed algorithm. Moreover, a unified systolic convolution array (USCA) architecture is developed based on the algorithm. The USCA architecture is implemented with a TSMC 28nm CMOS technology. The implementation results demonstrate that the architecture costs 206k logic gates and 114.7kB on-chip memory. It can reach a peak performance of 374.7GOPs and comsumes 201.1mW at a frequency of 1449MHz. Compared to similar works, USCA architecture achieves 3 × energy efficiency, which is measured by the number of GOPS per watt. Besides, to the best of our knowledge, USCA is the first architecture that can simultaneously support conventional convolution, deconvolution, and dilated convolution in an efficient way.
What problem does this paper attempt to address?