Abstract:The rapid growth of data size and accessibility in recent years has instigated a shift of philosophy in algorithm design for artificial intelligence. Instead of engineering algorithms by hand, the ability to learn composable systems automatically from massive amounts of data has led to ground-breaking performance in important domains such as computer vision, speech recognition, and natural language processing. The most popular class of techniques used in these domains is called deep learning, and is seeing significant attention from industry. However, these models require incredible amounts of data and compute power to train, and are limited by the need for better hardware acceleration to accommodate scaling beyond current data and model sizes. While the current solution has been to use clusters of graphics processing units (GPU) as general purpose processors (GPGPU), the use of field programmable gate arrays (FPGA) provide an interesting alternative. Current trends in design tools for FPGAs have made them more compatible with the high-level software practices typically practiced in the deep learning community, making FPGAs more accessible to those who build and deploy models. Since FPGA architectures are flexible, this could also allow researchers the ability to explore model-level optimizations beyond what is possible on fixed architectures such as GPUs. As well, FPGAs tend to provide high performance per watt of power consumption, which is of particular importance for application scientists interested in large scale server-based deployment or resource-limited embedded applications. This review takes a look at deep learning and FPGAs from a hardware acceleration perspective, identifying trends and innovations that make these technologies a natural fit, and motivates a discussion on how FPGAs may best serve the needs of the deep learning community moving forward.

Accelerating Deep Neuroevolution on Distributed FPGAs for Reinforcement Learning Problems

A Neuroevolution Approach to General Atari Game Playing

Neuroevolving Electronic Dynamical Networks

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Human-Level Control Through Directly-Trained Deep Spiking Q-Networks

CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices

Neuroevolution of Recurrent Architectures on Control Tasks

Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay

The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA

Brain-Inspired Hardware for Artificial Intelligence: Accelerated Learning in a Physical-Model Spiking Neural Network

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

Accelerating Neural-ODE Inference on FPGAs with Two-Stage Structured Pruning and History-based Stepsize Search.

An Efficient Application of Neuroevolution for Competitive Multiagent Learning

Deep Learning on FPGAs: Past, Present, and Future

Generative Adversarial Neuroevolution for Control Behaviour Imitation

NeuroLGP-SM: Scalable Surrogate-Assisted Neuroevolution for Deep Neural Networks

Active Dendrites Enable Efficient Continual Learning in Time-To-First-Spike Neural Networks

Playing Atari with Deep Reinforcement Learning

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

neuroAIx-Framework: design of future neuroscience simulation systems exhibiting execution of the cortical microcircuit model 20× faster than biological real-time

AlphaGo Policy Network: A DCNN Accelerator on FPGA