Kalmia: A Heterogeneous QoS-aware Scheduling Framework for DNN Tasks on Edge Servers

Ziyan Fu,Ju Ren,Deyu Zhang,Yuezhi Zhou,Yaoxue Zhang
DOI: https://doi.org/10.1109/infocom48880.2022.9796661
2022-01-01
Abstract:Motivated by the popularity of edge intelligence, DNN services have been widely deployed at the edge, posing significant performance pressure on edge servers. How to improve the QoS of edge DNN services becomes a crucial and challenging problem. Previous works, however, did not fully consider the heterogeneous QoS requirements on urgent and non-urgent tasks, causing frequent QoS violations. Meanwhile, our empirical study shows that severe task interference exists in concurrent DNN tasks, further degrading the timeliness of urgent tasks and throughput of non-urgent tasks. To address these issues, we propose Kalmia, a heterogeneous QoS-aware framework for DNN inference task scheduling on edge servers. Specifically, Kalmia includes an offline profiling stage and an online scheduling policy. In offline profiling, we build a regression model to predict the execution time of tasks. During online scheduling, we classify the tasks into urgent and non-urgent tasks and distribute them into two CUDA contexts. By a tailored scheduling strategy, non-urgent tasks can fully utilize the computing resources for throughput improvement, while the timeliness of urgent tasks can be guaranteed via preemption. Experimental results demonstrate that Kalmia can achieve up to 2.8× improvement in throughput and significantly reduce the deadline violation rate compared with state-of-the-art methods.
What problem does this paper attempt to address?