Dynamic Batching and Early-Exiting for Accurate and Timely Edge Inference

Yechao She,Tuo Shi,Jianping Wang,Bin Liu
DOI: https://doi.org/10.1109/vtc2024-spring62846.2024.10682995
2024-01-01
Abstract:This work aims to design a real-time inference scheduler that delivers accurate and timely edge inference ser-vices for dynamic inference arrivals by leveraging dynamic batching and early-exiting techniques. Specifically, we consider an edge inference server that is preinstalled with multiple early-exit Deep Neural Networks (DNNs) that support batch processing. The in-ference tasks with strict deadline requirements arrive at the edge server randomly, and the utility of each timely processed task depends on the achieved accuracy. Therefore, we aim to design an edge inference scheduler that maximizes the system's total utility subject to resource and deadline constraints. We present this problem's mixed integer linear programming formulation. This problem is challenging due to high computational complexity, coupled-decision making, and task randomness. We propose to decompose the original problem into two sub-problems: the task assignment problem and the DNN configuration problem. For the task assignment problem, we develop a greedy task assignment algorithm. For the DNN configuration problem, we propose a Deep Reinforcement Learning-based solution. Simulation results show that the proposed algorithms outperform the state-of-the-art baselines.
What problem does this paper attempt to address?