Abstract:Deep neural networks (DNNs) sustain high performance in today's data processing applications. DNN inference is resource-intensive thus is difficult to fit into a mobile device. An alternative is to offload the DNN inference to a cloud server. However, such an approach requires heavy raw data transmission between the mobile device and the cloud server, which is not suitable for mission-critical and privacy-sensitive applications such as autopilot. To solve this problem, recent advances unleash DNN services using the edge computing paradigm. The existing approaches split a DNN into two parts and deploy the two partitions to computation nodes at two edge computing tiers. Nonetheless, these methods overlook collaborative device-edge-cloud computation resources. Besides, previous algorithms demand the whole DNN re-partitioning to adapt to computation resource changes and network dynamics. Moreover, for resource-demanding convolutional layers, prior works do not give a parallel processing strategy without loss of accuracy at the edge side. To tackle these issues, we propose D3, a dynamic DNN decomposition system for synergistic inference without precision loss. The proposed system introduces a heuristic algorithm named horizontal partition algorithm to split a DNN into three parts. The algorithm can partially adjust the partitions at run time according to processing time and network conditions. At the edge side, a vertical separation module separates feature maps into tiles that can be independently run on different edge nodes in parallel. Extensive quantitative evaluation of five popular DNNs illustrates that D3 outperforms the state-of-the-art counterparts up to 3.4 times in end-to-end DNN inference time and reduces backbone network communication overhead up to 3.68 times.

Partitioning DNNs for Optimizing Distributed Inference Performance on Cooperative Edge Devices: A Genetic Algorithm Approach

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Genetic Algorithm-Based Online-Partitioning BranchyNet for Accelerating Edge Inference

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge Computing

Model Parallelism Optimization for Distributed DNN Inference on Edge Devices.

DNN Inference Acceleration with Partitioning and Early Exiting in Edge Computing

Partitioning and Deployment of Deep Neural Networks on Edge Clusters

CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices

A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system

Towards Resource-aware DNN Partitioning for Edge Devices with Heterogeneous Resources

DNN Partition and Offloading Strategy with Improved Particle Swarm Genetic Algorithm in VEC

Joint Optimization With DNN Partitioning and Resource Allocation in Mobile Edge Computing

A Survey on Deep Neural Network Partition over Cloud, Edge and End Devices

Joint multi-user DNN partitioning and task offloading in mobile edge computing

Energy-Efficient DNN Partitioning and Offloading for Task Completion Rate Maximization in Multiuser Edge Intelligence

Cost-Driven Offloading for DNN-based Applications over Cloud, Edge and End Devices

An Adaptive Task Migration Scheduling Approach for Edge-Cloud Collaborative Inference

Deep Neural Network Task Partitioning and Offloading for Mobile Edge Computing

Cost-Driven Off-Loading for DNN-Based Applications Over Cloud, Edge, and End Devices

Dynamic DNN Decomposition for Lossless Synergistic Inference