Efficient Convolution Architectures for Convolutional Neural Network

Jichen Wang,Jun Lin,Zhongfeng Wang
DOI: https://doi.org/10.1109/wcsp.2016.7752726
2016-01-01
Abstract:Convolutional Neural Network (CNN) is the state-of-the-art deep learning approach employed in various applications due to its remarkable performance. Convolutions in CNNs generally dominate the overall computation complexity and thus consume major computational power in real implementations. In this paper, efficient hardware architectures incorporating parallel fast finite impulse response (FIR) algorithm (FFA) for CNN convolution implementations are discussed. The theoretical derivation of 3 and 5 parallel FFAs is presented and the corresponding 3 and 5 parallel fast convolution units (FCUs) are proposed for most commonly used 3 × 3 and 5 × 5 convolutional kernels in CNNs, respectively. Compared to conventional CNN convolution architectures, the proposed FCUs reduce the number of multiplications used in convolutions significantly. Additionally, the FCUs minimize the number of reads from the feature map memory. Furthermore, a reconfigurable FCU architecture which suits the convolutions of both 3 × 3 and 5 × 5 kernels is proposed. Based on this, an efficient top-level architecture for processing a complete convolutional layer in a CNN is developed. To quantize the benefits of the proposed FCUs, the design of an FCU is coded with RTL and synthesized with TSMC 90nm CMOS technology. The implementation results demonstrate that 30% and 36% of the computational energy can be saved compared to conventional solutions with 3 × 3 and 5 × 5 kernels in CNN, respectively.
What problem does this paper attempt to address?