DNN Real-Time Collaborative Inference Acceleration with Mobile Edge Computing

Run Yang,Yan Li,Hui He,Weizhe Zhang
DOI: https://doi.org/10.1109/ijcnn55064.2022.9892582
2022-01-01
Abstract:The collaborative inference approach splits the Deep Neural Networks (DNNs) model into two parts. It runs collaboratively on the end device and cloud server to minimize inference latency and protect data privacy, especially in the 5G era. The scheme of DNN model partitioning depends on the network bandwidth size. However, in the context of dynamic mobile networks, resource-constrained devices cannot efficiently execute complex model partitioning algorithms to obtain optimal partitioning in real-time. In this paper, to overcome this challenge, we first formulate the model partitioning problem as a Min-cut problem to seek the optimal partition. Second, we propose a Collaborative Inference method based on model Compression named CIC. CIC enhances the efficiency of the execution of model partitioning algorithms on resource-constrained end devices by reducing the algorithm's complexity. CIC generates a splitting model based on the inherent characteristics of the DNN model and the platform resources. The splitting models are independent of the network environment, generated offline, and constantly used in the current environment. CIC has excellent compressibility, and even DNN models with hundreds of layers can be rapidly partitioned on resource-constrained devices. Experimental results show that our method is significantly more effective than existing solutions, speeding up model partitioning decision time by up to 100x, reducing inference latency by up to 2.6x, and increasing throughput by up to 3.3x in the best case.
What problem does this paper attempt to address?