A Fine-Grained End-to-End Latency Optimization Framework for Wireless Collaborative Inference
Lei Mu,Zhonghui Li,Wei Xiao,Ruilin Zhang,Peng Wang,Tao Liu,Geyong Min,Keqin Li
DOI: https://doi.org/10.1109/jiot.2023.3307820
IF: 10.6
2023-01-01
IEEE Internet of Things Journal
Abstract:Mobile devices are becoming increasingly capable of delivering intelligent services by leveraging deep learning architectures such as deep neural networks (DNNs). However, due to the compute-intensive nature of these tasks, mobile devices often struggle to handle them independently, leading to the exploration of collaborative inference as a promising solution for achieving low-latency mobile intelligence. Despite its potential benefits, many challenges need to be addressed in realizing the full potential of inference acceleration. This paper presents a collaborative device-edge inference optimization framework as a promising solution to inference acceleration. The framework comprises fundamental modules, including the Parameters Generator, Accuracy Predictor, Delay Calculator, and Optimizer, which are specifically designed to identify the optimal set of parameters for Model Compression, DNN Partition, and Feature Compression. To illustrate its implementation, an example of a deep CNN network is introduced, and the collaborative inference latency optimization is formulated as a mixed-integer programming problem. The implementation of a specific algorithm instance using a quantum-inspired optimizer within the optimization framework is then presented. A multiple regression-based inference accuracy prediction model is proposed to maintain inference accuracy close to that of the original network while significantly reducing the time consumption during the offline phase. Through various simulation scenarios involving inference tasks of AlexNet and RegNet on CIFAR-10, incorporating diverse hardware computation specifications and wireless communication link conditions, the proposed framework demonstrates superior performance in terms of inference acceleration compared to the compared methods.
computer science, information systems,telecommunications,engineering, electrical & electronic