Abstract:Deep neural networks (DNNs) have recently achieved impressive success across a wide range of real-world vision and language processing tasks, spanning from image classification to many other downstream vision tasks, such as object detection, tracking, and segmentation. However, previous well-established DNNs, despite being able to maintain superior accuracy, have also been evolving to be deeper and wider and thus inevitably necessitate prohibitive computational resources for both training and inference. This trend further enlarges the computational gap between computation-intensive DNNs and resource-constrained embedded computing systems, making it challenging to deploy powerful DNNs upon real-world embedded computing systems towards ubiquitous embedded intelligence. To alleviate the above computational gap and enable ubiquitous embedded intelligence, we, in this survey, focus on discussing recent efficient deep learning infrastructures for embedded computing systems, spanning from training to inference, from manual to automated, from convolutional neural networks to transformers, from transformers to vision transformers, from vision models to large language models, from software to hardware, and from algorithms to applications. Specifically, we discuss recent efficient deep learning infrastructures for embedded computing systems from the lens of (1) efficient manual network design for embedded computing systems, (2) efficient automated network design for embedded computing systems, (3) efficient network compression for embedded computing systems, (4) efficient on-device learning for embedded computing systems, (5) efficient large language models for embedded computing systems, (6) efficient deep learning software and hardware for embedded computing systems, and (7) efficient intelligent applications for embedded computing systems.

Enabling High Performance Deep Learning Networks on Embedded Systems

Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions

Explore Training of Deep Convolutional Neural Networks on Battery-powered Mobile Devices: Design and Application

Deep Neural Network Acceleration with Sparse Prediction Layers

DaDianNao: A Machine-Learning Supercomputer

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

Survey on Energy-Efficient Deep Neural Networks for Computer Vision

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Deep Learning on Mobile and Embedded Devices

Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms

Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression

FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices

Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs

A Computing Efficient Hardware Architecture for Sparse Deep Neural Network Computing

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

A Low-Power Accelerator for Deep Neural Networks with Enlarged Near-Zero Sparsity

A generic deep learning architecture optimization method for edge device based on start-up latency reduction

Enabling Deep Learning on Edge Devices

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding