Abstract:As a key technology of enabling Artificial Intelligence (AI) applications in 5G era, Deep Neural Networks (DNNs) have quickly attracted widespread attention. However, it is challenging to run computation-intensive DNN-based tasks on mobile devices due to the limited computation resources. What's worse, traditional cloud-assisted DNN inference is heavily hindered by the significant wide-area network latency, leading to poor real-time performance as well as low quality of user experience. To address these challenges, in this paper, we propose Edgent, a framework that leverages edge computing for DNN collaborative inference through device-edge synergy. Edgent exploits two design knobs: (1) DNN partitioning that adaptively partitions computation between device and edge for purpose of coordinating the powerful cloud resource and the proximal edge resource for real-time DNN inference; (2) DNN right-sizing that further reduces computing latency via early exiting inference at an appropriate intermediate DNN layer. In addition, considering the potential network fluctuation in real-world deployment, Edgent is properly design to specialize for both static and dynamic network environment. Specifically, in a static environment where the bandwidth changes slowly, Edgent derives the best configurations with the assist of regression-based prediction models, while in a dynamic environment where the bandwidth varies dramatically, Edgent generates the best execution plan through the online change point detection algorithm that maps the current bandwidth state to the optimal configuration. We implement Edgent prototype based on the Raspberry Pi and the desktop PC and the extensive experimental evaluations demonstrate Edgent's effectiveness in enabling on-demand low-latency edge intelligence.

Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors.

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Distributed Assignment With Load Balancing for DNN Inference at the Edge

CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge Devices

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms

Ace-Sniper: Cloud-Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous Devices

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

An Online Approach for DNN Model Caching and Processor Allocation in Edge Computing

Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs

Partitioning and Deployment of Deep Neural Networks on Edge Clusters

Adaptive Scheduling for Edge-Assisted DNN Serving

EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge Clusters

An Adaptive DNN Inference Acceleration Framework with End–edge–cloud Collaborative Computing

Collaborative Inference for Deep Neural Networks in Edge Environments