Abstract:Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FPGA accelerators have demonstrated better performance over generic processors, the accelerator design space has not been well exploited. One critical problem is that the computation throughput may not well match the memory bandwidth provided an FPGA platform. Consequently, existing approaches cannot achieve best performance due to under-utilization of either logic resource or memory bandwidth. At the same time, the increasing complexity and scalability of deep learning applications aggravate this problem. In order to overcome this problem, we propose an analytical design scheme using the roofline model. For any solution of a CNN design, we quantitatively analyze its computing throughput and required memory bandwidth using various optimization techniques, such as loop tiling and transformation. Then, with the help of rooine model, we can identify the solution with best performance and lowest FPGA resource requirement. As a case study, we implement a CNN accelerator on a VC707 FPGA board and compare it to previous approaches. Our implementation achieves a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

A Reconfigurable DNN Training Accelerator on FPGA

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

An FPGA-Based Reconfigurable Accelerator for Low-Bit DNN Training

An FPGA-based Accelerator for Deep Neural Network with Novel Reconfigurable Architecture.

Deep neural network accelerator based on FPGA

A Reconfigurable Accelerator for Generative Adversarial Network Training Based on FPGA

An FPGA-Based Reconfigurable CNN Training Accelerator Using Decomposable Winograd

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

An FPGA-based Accelerator Implementation for Deep Convolutional Neural Networks

An FPGA-based Mix-grained Sparse Training Accelerator

EF-Train: Enable Efficient On-device CNN Training on FPGA Through Data Reshaping for Online Adaptation or Personalization

Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

FPGA-based Acceleration of Deep Neural Networks Using High Level Method.

An FPGA-Based Resource-Saving Hardware Accelerator for Deep Neural Network

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation.

A High Energy Efficiency and Low Resource Consumption FPGA Accelerator for Convolutional Neural Network

A Deep Residual Networks Accelerator on FPGA

A Convolutional Neural Network Accelerator Based on FPGA

An FPGA-Based Energy-Efficient Reconfigurable Convolutional Neural Network Accelerator for Object Recognition Applications