EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

Zhipeng Gao,Shan Sun,Yinghan Zhang,Zijia Mo,Chen Zhao
DOI: https://doi.org/10.1007/978-3-030-95388-1_21
2022-01-01
Abstract:Edge computing has emerged as a promising line of research for processing large-scale data and providing low-latency services. Unfortunately, deploying deep neural networks (DNNs) on resource-limited edge devices presents unacceptable latency, hindering artificial intelligence from empowering edge devices. Prior solutions attempted to address this issue by offloading workload to the remote cloud. However, the cloud-assisted approach ignores that devices in the edge environment tend to exist as clusters. In this paper, we propose EdgeSP, a scalable multi-device parallel DNN inference framework that maximizes resource utilization of heterogeneous edge device clusters. We design a multiple fused-layer blocks parallelization strategy to reduce inter-device communication during parallel inference. Further, we add early exit branches to DNNs, empowering the device to trade-off latency and accuracy for a variety of sophisticated tasks. Experimental results show that EdgeSP enables inference latency acceleration of 2.3×-3.7×documentclass[12pt]{minimal}usepackage{amsmath}usepackage{wasysym}usepackage{amsfonts}usepackage{amssymb}usepackage{amsbsy}usepackage{mathrsfs}usepackage{upgreek}setlength{oddsidemargin}{-69pt}egin{document}$$2.3 imes -3.7 imes $$end{document} for DNN inference tasks of various scales and outperforms the existing naive parallel inference method. Additionally, EdgeSP can provide high accuracy inference services under various latency requirements.
What problem does this paper attempt to address?