Abstract:By combining edge computing and parallel computing, distributed edge computing has emerged as a new paradigm to exploit the booming IoT devices at the edge. To accelerate computation at the edge, i.e. , the inference tasks for DNN-driven applications, the parallelism of both computation and communication needs to be considered for distributed edge computing, and thus, the problem of Minimum Latency joint Communication and Computation Scheduling (MLCCS) is proposed. However, existing works have rigid assumptions that the communication time of each device is fixed and the workload can be split arbitrarily small. Aiming at making the work more practical and general, the MLCCS problem without the above assumptions is studied in this paper. Firstly, the MLCCS problem under a general model is formulated and proved to be NP-hard. Secondly, a pyramid-based computing model is proposed to consider the parallelism of communication and computation jointly, which has an approximation ratio of $1+\delta$ , where $\delta$ is related to devices' communication rates. An interesting property under such a computing model is identified and proved, i.e. , the optimal latency can be obtained under arbitrary scheduling order when all the devices share the same communication rate. When the workload cannot be split arbitrarily, an approximation algorithm with a ratio of at most $2\cdot (1+\delta )$ is proposed. Additionally, for handling the dynamically changing network scenarios, several algorithms are also proposed accordingly. Finally, the theoretical analysis and simulation results verify that the proposed algorithm has high performance in terms of latency. Two testbed experiments are also conducted, which show that the proposed method outperforms the existing methods, reducing the latency by up to 29.2% for inference tasks at the edge.

POS: an Operator Scheduling Framework for Multi-model Inference on Edge Intelligent Computing

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

Octopus: SLO-Aware Progressive Inference Serving via Deep Reinforcement Learning in Multi-tenant Edge Cluster

COS: Cross-Processor Operator Scheduling for Multi-Tenant Deep Learning Inference

IOS: Inter-Operator Scheduler for CNN Acceleration

Multi-Model Running Latency Optimization in an Edge Computing Paradigm

Multiagent Reinforcement Learning-Based Multimodel Running Latency Optimization in Vehicular Edge Computing Paradigm

Online VNF Chaining and Predictive Scheduling: Optimality and Trade-offs

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Energy-Aware Selective Inference Task Offloading for Real-Time Edge Computing Applications

A High-Performance Dataflow-Centric Optimization Framework for Deep Learning Inference on the Edge

BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge Devices

A Co-Scheduling Framework for DNN Models on Mobile and Edge Devices with Heterogeneous Hardware

Optimum: Runtime Optimization for Multiple Mixed Model Deployment Deep Learning Inference

MoEI: Mobility-Aware Edge Inference Based on Model Partition and Service Migration

Xenos: Dataflow-Centric Optimization to Accelerate Model Inference on Edge Devices

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

Towards Real-Time Inference Offloading with Distributed Edge Computing: the Framework and Algorithms

Collaborative Inference for Large Models with Task Offloading and Early Exiting

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU