Accelerating Deep Neural Network Tasks Through Edge-Device Adaptive Inference

Xinyang Zhang,Yinglei Teng,Nan Wang,Boya Sun,Gang Hu
DOI: https://doi.org/10.1109/pimrc56721.2023.10293996
2023-01-01
Abstract:As the key technology of artificial intelligence(AI), Deep Neural Networks (DNNs) have been widely used in mobile applications, such as video analytics in autonomous driving. However, due to the constrained computation capabilities on mobile devices (MDs), it is challenging to meet the critical accuracy and real-time demand of DNN tasks, which would result in a serious drop in quality of service (QoS). A popular alternative is to offload DNN tasks to edges for intelligence inference, nevertheless, this results in a heavy communication burden due to large amounts of raw data. In this paper, we propose an adaptive DNN co-Inference (ADCI) strategy which obtains the flexible computation division among devices and edge servers with elastic execution by combining the early exit and model partition policies. Establishing a balanced utility function, we jointly optimize dynamic offloading and model adoption while taking into account the multi-user and multi-server edge computing system. To tackle the high coupling among mixed variables, we propose a two-stage deep reinforcement learning (DRL) algorithm. The early-exit and model partition decisions are tracked using the Lagrange method as a soft option. Results from simulations show that the ADCI strategy performs well with timely accuracy
What problem does this paper attempt to address?