Cloud–Edge Collaborative Inference with Network Pruning
Mingran Li,Xuejun Zhang,Jiasheng Guo,Feng Li
DOI: https://doi.org/10.3390/electronics12173598
IF: 2.9
2023-08-26
Electronics
Abstract:With the increase in model parameters, deep neural networks (DNNs) have achieved remarkable performance in computer vision, but larger DNNs create a bottleneck for deploying DNNs on resource-constrained edge devices. The cloud–edge collaborative inference based on network pruning provides a solution for the deployment of DNNs on edge devices. However, the pruning methods adopted by existing frameworks are locally effective, and the compressed models are over-sparse. In this paper, we design a cloud–edge collaborative inference framework based on network pruning to make full use of the limited computing resources on edge devices. In our framework, we propose a sparsity-aware feature bias minimization pruning method to reduce the feature bias that happens during network pruning and prevent the pruned model from being over-sparse. To further reduce the inference latency, we consider the difference in computing resources between edge devices and the cloud, then design a task-oriented asymmetric feature coding to reduce the communication overhead of transmitting intermediate data. With comprehensive experiments, our framework can reduce end-to-end latency by 82% to 84% with less than 1% accuracy loss, compared to the cloud–edge collaborative inference framework with traditional methods, and our framework has the lowest end-to-end latency and accuracy loss compared to other frameworks.
engineering, electrical & electronic,computer science, information systems,physics, applied