Abstract:In recent years, deep neural networks (DNNs) have witnessed a booming of artificial intelligence Internet of Things applications with stringent demands across high accuracy and low latency. A widely adopted solution is to process such computation-intensive DNNs inference tasks with edge computing. Nevertheless, existing edge-based DNN processing methods still cannot achieve acceptable performance due to the intensive transmission data and unnecessary computation. To address the above limitations, we take the advantage of Multi-exit DNNs (ME-DNNs) that allows the tasks to exit early at different depths of the DNN during inference, based on the input complexity. However, naively deploying ME-DNNs in edge still fails to deliver fast and consistent inference in the wild environment. Specifically, 1) at the model-level, unsuitable exit settings will increase additional computational overhead and will lead to excessive queuing delay; 2) at the computation-level, it is hard to sustain high performance consistently in the dynamic edge computing environment. In this paper, we present a Low Latency Edge Intelligence Scheme based on Multi-Exit DNNs (LEIME) to tackle the aforementioned problem. At the model-level, we propose an exit setting algorithm to automatically build optimal ME-DNNs with lower time complexity; At the computation-level, we present a distributed offloading mechanism to fine-tune the task dispatching at runtime to sustain high performance in the dynamic environment, which has the property of close-to-optimal performance guarantee. Finally, we implement a prototype system and extensively evaluate it through testbed and large-scale simulation experiments. Experimental results demonstrate that LEIME significantly improves applications' performance, achieving 1.1–18.7 × speedup in different situations.

Enabling Low Latency Edge Intelligence Based on Multi-exit DNNs in the Wild

Unlocking the Non-deterministic Computing Power with Memory-Elastic Multi-Exit Neural Networks

Elastic DNN Inference with Unpredictable Exit in Edge Computing

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices

Pre-DNNOff: On-Demand DNN Model Offloading Method for Mobile Edge Computing

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing

An Adaptive DNN Inference Acceleration Framework with End–edge–cloud Collaborative Computing

EdgeKE: An On-Demand Deep Learning IoT System for Cognitive Big Data on Industrial Edge Devices

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

AdaMEC: Towards a Context-Adaptive and Dynamically-Combinable DNN Deployment Framework for Mobile Edge Computing

Multi-exit DNN inference acceleration for intelligent terminal with heterogeneous processors

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

Energy-Efficient DNN Partitioning and Offloading for Task Completion Rate Maximization in Multiuser Edge Intelligence

Enabling Deep Learning on Edge Devices

DNNOff: Offloading DNN-Based Intelligent IoT Applications in Mobile Edge Computing

Dynamic DNN Decomposition for Lossless Synergistic Inference

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge