Abstract:Convolutional neural networks (CNNs) have gained great success in various fields, such as computer vision and natural language processing. Besides, with the breakthrough in unsupervised learning, generative adversarial network (GAN) is recently utilized to generate virtual data from limited data sets. The generative model of GAN has impressive applications, such as style transfer and image super-resolution. However, the promising performance of CNN and GAN comes at the cost of prohibitive computation complexity. The convolution (CONV) in CNN and the transposed CONV (TCONV) in GAN are the two operations that dominant the overall complexity. The prior works exploit the fast algorithms, Winograd and fast Fourier transform (FFT), to reduce the complexity of spatial CONV. However, Winograd only supports fixed filter size while FFT has high transform overhead. Moreover, very few works apply fast algorithms to accelerate GAN models. In this article, a reconfigurable and low-complexity accelerator on ASIC for both CNN and GAN is proposed to address these problems. First, by exploiting Fermat number transform (FNT), we propose two FNT-based fast algorithms to reduce the complexity of CONV and TCONV computations, respectively. Then the architectures of the FNT-based accelerator are presented to implement the proposed fast algorithms. The methodology to determine the design parameters and optimize the dataflow is also described for obtaining maximum performance and optimal efficiency. Moreover, we implement the proposed accelerator on 65 nm 1P9M technology and evaluate it on various CNN and GAN models. The post-layout results show that our design achieves a throughput of 288.0 GOP/s on VGG-16 with 25.11 GOP/s/mm2 area efficiency, which is superior to the state-of-the-art CNN accelerators. Furthermore, at least $1.7\times $ speedup over the existing accelerators is obtained on GAN. The resulting energy efficiency is $275.3\times $ and $12.5\times $ of CPU and GPU.

Reconfigurable and Low-Complexity Accelerator for Convolutional and Generative Networks over Finite Fields

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

FTA-GAN: A Computation-Efficient Accelerator for GANs with Fast Transformation Algorithm

A Reconfigurable Accelerator for Generative Adversarial Network Training Based on FPGA

Exploring Efficient Acceleration Architecture for Winograd-Transformed Transposed Convolution of GANs on FPGAs

Exploring Resource-Efficient Acceleration Algorithm for Transposed Convolution of GANs on FPGA

CaFPGA: an Automatic Generation Model for CNN Accelerator.

An Efficient Design Flow for Accelerating Complicated-connected CNNs on a Multi-FPGA Platform.

A Reconfigurable Accelerator for Sparse Convolutional Neural Networks.

FPGA Accelerator for CNN: an Exploration of the Kernel Structured Sparsity and Hybrid Arithmetic Computation

A Computation-Efficient Solution for Acceleration of Generative Adversarial Network

Using Fermat Number Transform to Accelerate Convolutional Neural Network.

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

FPGA-based Accelerator for Convolutional Neural Network

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

A High-Performance Accelerator for Large-Scale Convolutional Neural Networks

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA