Abstract:With the rapid development of Artificial Intelligence, Internet of Things, 5G, and other technologies, a number of emerging intelligent applications represented by image recognition, voice recognition, autonomous driving, and intelligent manufacturing have appeared. These applications require efficient and intelligent processing systems for massive data calculations, so it is urgent to apply better DNN in a faster way. Although, compared with GPU, FPGA has a higher energy efficiency ratio, and shorter development cycle and better flexibility than ASIC. However, FPGA is not a perfect hardware platform either for computational intelligence. This paper provides a survey of the latest acceleration work related to the familiar DNNs and proposes three new directions to break the bottleneck of the DNN implementation. So as to improve calculating speed and energy efficiency of edge devices, intelligent embedded approaches including model compression and optimized data movement of the entire system are most commonly used. With the gradual slowdown of Moore’s Law, the traditional Von Neumann Architecture generates a “Memory Wall” problem, resulting in more power-consuming. In-memory computation will be the right medicine in the post-Moore law era. More complete software/hardware co-design environment will direct researchers’ attention to explore deep learning algorithms and run the algorithm on the hardware level in a faster way. These new directions start a relatively new paradigm in computational intelligence, which have attracted substantial attention from the research community and demonstrated greater potential over traditional techniques.

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

FPGA based cellular neural network optimization: from design space to system.

DACO: Pursuing Ultra-low Power Consumption Via DNN-Adaptive CPU-GPU CO-optimization on Mobile Devices

A Multi-Level-Optimization Framework for FPGA-Based Cellular Neural Network Implementation.

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

Challenges in Energy-Efficient Deep Neural Network Training with FPGA.

EF-Train: Enable Efficient On-device CNN Training on FPGA Through Data Reshaping for Online Adaptation or Personalization

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

A generic deep learning architecture optimization method for edge device based on start-up latency reduction

An SSD-MobileNet Acceleration Strategy for FPGAs Based on Network Compression and Subgraph Fusion

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

An Overview of Energy-Efficient Hardware Accelerators for On-Device Deep-Neural-Network Training

Accelerating Mobile Applications at the Network Edge with Software-Programmable FPGAs.

New paradigm of FPGA-based computational intelligence from surveying the implementation of DNN accelerators

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

Power-Driven DNN Dataflow Optimization on FPGA