HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

Jing Wu,Lin Wang,Qiangyu Pei,Xingqi Cui,Fangming Liu,Tingting Yang
DOI: https://doi.org/10.1109/tpds.2022.3195664
IF: 5.3
2022-09-03
IEEE Transactions on Parallel and Distributed Systems
Abstract:Deep neural networks (DNNs) have become a critical component for inference in modern mobile applications, but the efficient provisioning of DNNs is non-trivial. Existing mobile- and server-based approaches compromise either the inference accuracy or latency. Instead, a hybrid approach can reap the benefits of the two by splitting the DNN at an appropriate layer and running the two parts separately on the mobile and the server respectively. Nevertheless, the DNN throughput in the hybrid approach has not been carefully examined, which is particularly important for edge servers where limited compute resources are shared among multiple DNNs. This article presents HiTDL, a runtime framework for managing multiple DNNs provisioned following the hybrid approach at the edge. HiTDL's mission is to improve edge resource efficiency by optimizing the combined throughput of all co-located DNNs, while still guaranteeing their SLAs. To this end, HiTDL first builds comprehensive performance models for DNN inference latency and throughout with respect to multiple factors including resource availability, DNN partition plan, and cross-DNN interference. HiTDL then uses these models to generate a set of candidate partition plans with SLA guarantees for each DNN. Finally, HiTDL makes global throughput-optimal resource allocation decisions by selecting partition plans from the candidate set for each DNN via solving a fairness-aware multiple-choice knapsack problem. Experimental results based on a prototype implementation show that HiTDL improves the overall throughput of the edge by 4.3× compared with the state-of-the-art.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?