Accelerating DNN Inference by Edge-Cloud Collaboration

Jianan Chen,Qi Qi,Jingyu Wang,Haifeng Sun,Jianxin Liao
DOI: https://doi.org/10.1109/IPCCC51483.2021.9679434
2021-01-01
Abstract:Deep neural networks (DNN) have become indispensable tools for intelligent applications today. The demand for deploying DNN on the edge devices increases dramatically. Unfortunately, it is challenging because the DNN inference is computation-intensive, but edge devices are always resourceconstraint. Prior solutions attempted to address these challenges with collaboration between cloud and edge devices, but they do not take the inference request rate into account. However, the inference delay will increase dramatically while the request rate becomes higher. In this paper, we propose a scheme to dynamic partition DNN into two or three parts and distribute them at the edge and cloud, achieving the lowest delay with the change of request rate. The scheme selects the optimal partition points of DNN with a layer evaluation model (LEM) and a total delay prediction model (DPM) under different request rates. The experiments of distributed deploying AlexNet, VGG, NiN and ResNet DNN models on image classification dataset ImageNet show that the proposed scheme significantly reduces the total end-to-end latency by fully using both the edge and cloud resources. It reduces the inference delay by 1.3 to 1.6 times and improves the throughput 1.2 to 1.7 times compared to the state of art partition approach.
What problem does this paper attempt to address?