Cloud-Edge Inference under Communication Constraints: Data Quantization and Early Exit.

Yu Gao,Wei Wang,Dezhi Wang,Huiqiong Wang,Zhaoyang Zhang
DOI: https://doi.org/10.1109/iswcs56560.2022.9940360
2022-01-01
Abstract:The inference delay of deep neural networks (DNN) cannot always fulfill the application requirements due to the data transmission to the cloud, which can be effectively alleviated by cloud-edge collaboration via DNN partitioning. However, the communication capability between cloud and edge is usually limited. In this paper, we propose a threshold-based data quantization and exit (TDQE) framework, where the classification thresholds divide the data to different parts and determine to either quantize the data for transmitting under the communication constraints or early exit the DNN at the partition point. To solve the optimal solutions of the thresholds, we model an accuracy optimization problem under communication constraints, and solve it through the linear programming. In order to reduce the impact of quantization on accuracy due to difference parts of data, we further adjust the quantization ranges for each part of data to refine the quantization performance. Based on the optimization results, TDQE algorithm is proposed to construct the DNN partitioning with classified data processing. Finally, to evaluate the proposed method, we compare it with two traditional DNN partitioning algorithms via the simulation results. The results show that the proposed algorithm outperforms the other two baselines with respect to the accuracy and meets the real-time requirements.
What problem does this paper attempt to address?