A Novel Low-Communication Energy-Efficient Reconfigurable CNN Acceleration Architecture

Di Wu,Jin Chen,Wei Cao,Lingli Wang
DOI: https://doi.org/10.1109/fpl.2018.00019
2018-01-01
Abstract:Winograd algorithm is an efficient approach to alleviate the computation burden of deep CNNs. Firstly, we introduce a fast matrix algorithm to combine with Winograd algorithm to further reduce the computation complexity and adapt the Winograd algorithm to large-stride convolution with a kernel-partitioning method. Secondly, computation efficiency improvement due to the fast algorithms aggravates the off-chip communication. DRAM access of different data-flows varies significantly with different CNN patterns. Dynamic configurations of both data-flows and on-chip shared memory can reduce the DRAM access effectively. A quantitative analysis is established on the design space to guide the configurations. Finally, a reconfigurable architecture that supports three categories of data-flows is presented. For evaluation, VGGNet16, AlexNet and ResNet50 are implemented respectively which can achieve the state-of-art DSP efficiency. Overall performance of 685.6GOP/s, 1250GOP/s and 507GOP/s for AlexNet, VGGNet16 and ResNet50 respectively on ZC706 platform and better energy efficiency are achieved compared with representative prior works.
What problem does this paper attempt to address?