Collaborative DNNs Inference with Joint Model Partition and Compression in Mobile Edge-Cloud Computing Networks

Yaxin Tang,Xiuhua Li,Hui Li,Zhengyi Yang,Xiaofei Wang,Victor C. M. Leung
DOI: https://doi.org/10.1109/wcnc57260.2024.10571207
2024-01-01
Abstract:Mobile edge-cloud computing utilizes the computing resources of edge devices and cloud servers to execute complex deep neural networks (DNNs) for collaborative inference. However, many existing collaborative inference methods do not fully consider the limited resources of edge devices, resulting in high inference latency. In this paper, we design an integrated computational framework that combines model partition and compression to reduce inference latency. Specifically, we partition a DNN model at the middle layer and deploy the previous layer on the edge device and the subsequent layer on the cloud server respectively. We propose a collaborative dual-agent reinforcement learning algorithm called CPCDRL to determine partition point and compression ratios. It enables adaptive adjustments of compression ratios based on various partition points, with the overarching goal of minimizing the inference latency across the entire DNN model. The proposed algorithm can significantly reduce computational latency while minimizing accuracy loss compared to the baseline schemes.
What problem does this paper attempt to address?