Abstract:Today, convolutional anddeconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500mW@200MHz and occupies 5.6%, 4.1%, 17%, and 96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8W@150MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to 20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using 5.7× fewer on-chip memory resources.

An Efficient Dataflow for Convolutional Generative Models

An Intermediate-Centric Dataflow for Transposed Convolution Acceleration on FPGA

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

HUGE2: a Highly Untangled Generative-model Engine for Edge-computing

DTrans: A Dataflow-transformation FPGA Accelerator with Nonlinear-operators Fusion Aiming for the Generative Model

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration

Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal Balance

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

Efficient Hardware Architectures for Deep Convolutional Neural Network

Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering

A Reconfigurable Spatial Architecture for Energy-Efficient Inception Neural Networks

MaCow: Masked Convolutional Generative Flow

Pyramidal Flow Matching for Efficient Video Generative Modeling

Flow Generator Matching

Data-centric Computation Mode for Convolution in Deep Neural Networks.

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

FTConv: FPGA Acceleration for Transposed Convolution Layers in Deep Neural Networks

An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs