Abstract:Convolutional Neural Networks (CNNs) have a major impact on our society because of the numerous services they provide. On the other hand, they require considerable computing power. To satisfy these requirements, it is possible to use graphic processing units (GPUs). However, high power consumption and limited external IOs constrain their usability and suitability in industrial and mission-critical scenarios. Recently, the number of researches that utilize FPGAs to implement CNNs are increasing rapidly. This is due to the lower power consumption and easy reconfigurability offered by these platforms. Because of the research efforts put into topics such as architecture, synthesis and optimization, some new challenges are arising to integrate such hardware solutions to high-level machine learning software libraries. This paper introduces an integrated framework (CNN2Gate) that supports compilation of a CNN model for an FPGA target. CNN2Gate exploits the OpenCL synthesis workflow for FPGAs offered by commercial vendors. CNN2Gate is capable of parsing CNN models from several popular high-level machine learning libraries such as Keras, Pytorch, Caffe2 etc. CNN2Gate extracts computation flow of layers, in addition to weights and biases and applies a "given" fixed-point quantization. Furthermore, it writes this information in the proper format for OpenCL synthesis tools that are then used to build and run the project on FPGA. CNN2Gate performs design-space exploration using a reinforcement learning agent and fits the design on different FPGAs with limited logic resources automatically. This paper reports results of automatic synthesis and design-space exploration of AlexNet and VGG-16 on various Intel FPGA platforms. CNN2Gate achieves a latency of 205 ms for VGG-16 and 18 ms for AlexNet on the FPGA.

Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array

DaDianNao: A Machine-Learning Supercomputer

Design of Convolutional Neural Network Based on FPGA

Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

FPGA based Flexible Implementation of Light Weight Inference on Deep Convolutional Neural Networks

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.

Efficient Hardware Architectures for Deep Convolutional Neural Network

Deploying deep learning networks based advanced techniques for image processing on FPGA platform

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

An All-Digital Compute-In-Memory FPGA Architecture for Deep Learning Acceleration

CNN2Gate: Toward Designing a General Framework for Implementation of Convolutional Neural Networks on FPGA

Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization

Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference

FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and Stacked Filters Stationary Flow

F-CNN: an FPGA-based Framework for Training Convolutional Neural Networks.

HARDWARE ACCELERATOR: IMPLEMENTATION OF CNN ON FPGA FOR DIGIT RECOGNITION

FPGA-based implementation of deep neural network using stochastic computing

Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays