Towards Real-Time Inference Offloading with Distributed Edge Computing: the Framework and Algorithms
Quan Chen,Song Guo,Kaijia Wang,Wenchao Xu,Jing Li,Zhipeng Cai,Hong Gao,Albert Y. Zomaya
DOI: https://doi.org/10.1109/tmc.2023.3335051
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:By combining edge computing and parallel computing, distributed edge computing has emerged as a new paradigm to exploit the booming IoT devices at the edge. To accelerate computation at the edge, i.e. , the inference tasks for DNN-driven applications, the parallelism of both computation and communication needs to be considered for distributed edge computing, and thus, the problem of Minimum Latency joint Communication and Computation Scheduling (MLCCS) is proposed. However, existing works have rigid assumptions that the communication time of each device is fixed and the workload can be split arbitrarily small. Aiming at making the work more practical and general, the MLCCS problem without the above assumptions is studied in this paper. Firstly, the MLCCS problem under a general model is formulated and proved to be NP-hard. Secondly, a pyramid-based computing model is proposed to consider the parallelism of communication and computation jointly, which has an approximation ratio of $1+\delta$ , where $\delta$ is related to devices' communication rates. An interesting property under such a computing model is identified and proved, i.e. , the optimal latency can be obtained under arbitrary scheduling order when all the devices share the same communication rate. When the workload cannot be split arbitrarily, an approximation algorithm with a ratio of at most $2\cdot (1+\delta )$ is proposed. Additionally, for handling the dynamically changing network scenarios, several algorithms are also proposed accordingly. Finally, the theoretical analysis and simulation results verify that the proposed algorithm has high performance in terms of latency. Two testbed experiments are also conducted, which show that the proposed method outperforms the existing methods, reducing the latency by up to 29.2% for inference tasks at the edge.