Abstract:By combining edge computing and parallel computing, distributed edge computing has emerged as a new paradigm to exploit the booming IoT devices at the edge. To accelerate computation at the edge, i.e. , the inference tasks for DNN-driven applications, the parallelism of both computation and communication needs to be considered for distributed edge computing, and thus, the problem of Minimum Latency joint Communication and Computation Scheduling (MLCCS) is proposed. However, existing works have rigid assumptions that the communication time of each device is fixed and the workload can be split arbitrarily small. Aiming at making the work more practical and general, the MLCCS problem without the above assumptions is studied in this paper. Firstly, the MLCCS problem under a general model is formulated and proved to be NP-hard. Secondly, a pyramid-based computing model is proposed to consider the parallelism of communication and computation jointly, which has an approximation ratio of $1+\delta$ , where $\delta$ is related to devices' communication rates. An interesting property under such a computing model is identified and proved, i.e. , the optimal latency can be obtained under arbitrary scheduling order when all the devices share the same communication rate. When the workload cannot be split arbitrarily, an approximation algorithm with a ratio of at most $2\cdot (1+\delta )$ is proposed. Additionally, for handling the dynamically changing network scenarios, several algorithms are also proposed accordingly. Finally, the theoretical analysis and simulation results verify that the proposed algorithm has high performance in terms of latency. Two testbed experiments are also conducted, which show that the proposed method outperforms the existing methods, reducing the latency by up to 29.2% for inference tasks at the edge.

Cloud-Edge Collaborative Intelligent Inference Based on Distributed Neural Networks in Power Distribution Networks

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Accelerating DNN Inference by Edge-Cloud Collaboration

Cloud-Edge Inference under Communication Constraints: Data Quantization and Early Exit.

DECC: Delay-Aware Edge-Cloud Collaboration for Accelerating DNN Inference

Adaptive Deep Inference Framework for Cloud-Edge Collaboration

EdgeLD: Locally Distributed Deep Learning Inference on Edge Device Clusters

ADDA: Adaptive Distributed DNN Inference Acceleration in Edge Computing Environment

An Adaptive Task Migration Scheduling Approach for Edge-Cloud Collaborative Inference

Joint DNN Partition and Resource Allocation Optimization for Energy-Constrained Hierarchical Edge-Cloud Systems

Model Parallelism Optimization for Distributed DNN Inference on Edge Devices.

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

Energy-Aware Workload Allocation for Distributed Deep Neural Networks in Edge-Cloud Continuum.

HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms

Distributed Assignment With Load Balancing for DNN Inference at the Edge

Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading

CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices

Towards Real-Time Inference Offloading with Distributed Edge Computing: the Framework and Algorithms

Collaborative Inference for Deep Neural Networks in Edge Environments

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing