Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Junjie Hu,Danfeng Sun,Jin Fan,Junwei Dong,Baiping Chen,Huifeng Wu,Jia Wu
DOI: https://doi.org/10.1109/tce.2024.3481156
2024-01-01
IEEE Transactions on Consumer Electronics
Abstract:Resource-limited devices encounter challenges when executing complex neural networks in the edge computing scenario. The time taken to complete the task is unacceptable. Most existing methods sacrifice the prediction accuracy or are greatly affected by the network state. Therefore, we propose a multi-device collaborative pipeline parallel inference method to diminish model inference time in the edge-cloud scenario. This method consists of model splitting, partial knowledge distillation, and model distributed deployment. The model splitting splits the model into two parts, one for the server and the other for edge devices. The utilization of partial knowledge aims to strike a balance between inference time and prediction accuracy. The model distributed deployment further splits the student model according to the number and performance of edge devices, along with the network conditions between edge devices, thereby achieving pipeline parallelization of model inference. After model splitting, we add plugin bottleneck layers between different partial models to compress the communication data and improve communication efficiency. Experimental results show that while ensuring prediction accuracy close to that of the original model, our proposed method can decrease the complexity of the deep learning models, reduce inference time, and enhance the utilization of idle edge devices.
What problem does this paper attempt to address?