Abstract:Motivated by the prospects of 5G communications and industrial Internet of Things (IoT), recent years have seen the rise of a new computing paradigm, edge computing, which shifts data analytics to network edges that are at the proximity of big data sources. Although deep neural networks (DNNs) have been extensively used in many platforms and scenarios, they are usually both compute and memory intensive, thus, difficult to be deployed on resource-limited edge devices and in performance-demanding edge applications. Hence, there is an urgent need for techniques that enable DNN models to fit into edge devices, while ensuring acceptable execution costs and inference accuracy. This article proposes an on-demand DNN model inference system for industrial edge devices, called knowledge distillation and early exit on edge (EdgeKE). It focuses on the following two design knobs: first, DNN compression based on knowledge distillation, which trains the compact edge models under the supervision of large complex models for improving accuracy and speed; second, DNN acceleration based on early exit, which provides flexible choices for satisfying distinct latency or accuracy requirements from edge applications. By extensive evaluations on the CIFAR100 dataset and across three state-of-art edge devices, experimental results demonstrate that EdgeKE significantly outperforms the baseline models in terms of inference latency and memory footprint, while maintaining competitive classification accuracy. Furthermore, EdgeKE is verified to be efficiently adaptive to the application requirements on the inference performance. The accuracy loss is within 4.84% under various latency constraints, and the speedup ratio is up to 3.30× under various accuracy requirements.

Memory-efficient Deep Learning Inference with Incremental Weight Loading and Data Layout Reorganization on Edge Systems.

Towards Memory-Efficient Inference in Edge Video Analytics

Overcoming Memory Constraint for Improved Target Classification Performance on Embedded Deep Learning Systems

Understanding Sensor Data Using Deep Learning Methods on Resource-Constrained Edge Devices.

Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices

FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices.

EdgeKE: An On-Demand Deep Learning IoT System for Cognitive Big Data on Industrial Edge Devices

Conflict-Resilient Incremental Offloading of Deep Neural Networks to the Edge of Smart Environment

Efficient Memory Management for Deep Neural Net Inference

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Edge-Cloud Cooperation for DNN Inference Via Reinforcement Learning and Supervised Learning

EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge Clusters

Model Parallelism Optimization for Distributed DNN Inference on Edge Devices.

Design and Implementation of Deep Neural Network for Edge Computing

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

Accelerating DNN Inference by Edge-Cloud Collaboration

An Online Approach for DNN Model Caching and Processor Allocation in Edge Computing

Enabling Deep Learning on Edge Devices

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

EdgeLD: Locally Distributed Deep Learning Inference on Edge Device Clusters

Efficient Continual Learning with Low Memory Footprint For Edge Device