Reconfigurable and Low-Complexity Accelerator for Convolutional and Generative Networks over Finite Fields

Weihong Xu,Zaichen Zhang,Xiaohu You,Chuan Zhang
DOI: https://doi.org/10.1109/tcad.2020.2973355
IF: 2.9
2020-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Convolutional neural networks (CNNs) have gained great success in various fields, such as computer vision and natural language processing. Besides, with the breakthrough in unsupervised learning, generative adversarial network (GAN) is recently utilized to generate virtual data from limited data sets. The generative model of GAN has impressive applications, such as style transfer and image super-resolution. However, the promising performance of CNN and GAN comes at the cost of prohibitive computation complexity. The convolution (CONV) in CNN and the transposed CONV (TCONV) in GAN are the two operations that dominant the overall complexity. The prior works exploit the fast algorithms, Winograd and fast Fourier transform (FFT), to reduce the complexity of spatial CONV. However, Winograd only supports fixed filter size while FFT has high transform overhead. Moreover, very few works apply fast algorithms to accelerate GAN models. In this article, a reconfigurable and low-complexity accelerator on ASIC for both CNN and GAN is proposed to address these problems. First, by exploiting Fermat number transform (FNT), we propose two FNT-based fast algorithms to reduce the complexity of CONV and TCONV computations, respectively. Then the architectures of the FNT-based accelerator are presented to implement the proposed fast algorithms. The methodology to determine the design parameters and optimize the dataflow is also described for obtaining maximum performance and optimal efficiency. Moreover, we implement the proposed accelerator on 65 nm 1P9M technology and evaluate it on various CNN and GAN models. The post-layout results show that our design achieves a throughput of 288.0 GOP/s on VGG-16 with 25.11 GOP/s/mm2 area efficiency, which is superior to the state-of-the-art CNN accelerators. Furthermore, at least $1.7\times $ speedup over the existing accelerators is obtained on GAN. The resulting energy efficiency is $275.3\times $ and $12.5\times $ of CPU and GPU.
What problem does this paper attempt to address?