BCEdge: SLO-Aware DNN Inference Services With Adaptive Batch-Concurrent Scheduling on Edge Devices

Ziyang Zhang,Yang Zhao,Huan Li,Jie Liu
DOI: https://doi.org/10.1109/tnsm.2024.3409701
2024-08-25
IEEE Transactions on Network and Service Management
Abstract:As deep neural networks (DNNs) are increasingly used in a broad spectrum of edge intelligent applications, it is often necessary to provide multi-DNN model inference services, and it is nontrivial for edge inference platforms to simultaneously deliver high-throughput and low-latency. Such edge devices with multi-DNN model pose new challenges for scheduler designs. First, edge devices should be capable of efficiently scheduling multiple heterogeneous DNN models in order to optimize system utilization. Second, each inference request may have different service level objectives (SLOs) to improve quality of service (QoS). To address these challenges, this paper proposes BCEdge, a novel learning-based scheduling framework that incorporates adaptive batching and concurrent execution of DNN inference services on edge devices. We first propose a shared memory policy to reduce the memory contention among multiple DNN models. Afterwards, a utility function is defined to evaluate the trade-off between throughput and latency. The scheduler in BCEdge leverages branch-based deep reinforcement learning (DRL) to maximize utility by 1) optimizing batch size, 2) automatically identifying the number of concurrent instances for multiple DNN models, and 3) determining the shared memory configuration among multiple DNN models. Besides, the lightweight DNN-based prediction model in BCEdge can achieve SLO awareness by reducing the performance interference among multiple DNN models. Our prototype implemented on various edge devices illustrates that BCEdge enhances utility by up to 37.6% and reduces memory usage by up to 38% on average, compared to state-of-the-art schemes, while maintaining the SLO violation rate within 5%.
computer science, information systems
What problem does this paper attempt to address?