Abstract:Deep neural network (DNN) inference with low delay and high accuracy requirements is usually computation intensive. The collaboration among mobile devices and the network edge is a potential solution to support DNN inference. Moreover, the sampling rates of mobile devices can be dynamically configured to adapt to network conditions, which can be used to minimize the inference service delay. In this chapter, we first introduce the concept of DNN inference, and its two underlying technologies, i.e., mobile edge computing and machine learning. Then, we present a case study on collaborative DNN inference via device-edge orchestration. Specifically, taking the channel variation and task arrival randomness into consideration, we formulate the DNN inference delay minimization problem as a constrained Markov decision process (CMDP). In the problem, sampling rate adaption, inference task offloading, and edge computing resource allocation are jointly optimized while guaranteeing the long-term accuracy requirements of different inference services. To solve the problem, we propose a learning-based solution with three steps. Firstly, the CMDP is transformed into an MDP by leveraging the Lyapunov optimization technique. Secondly, a deep reinforcement learning (RL)-based algorithm is proposed to solve the transformed MDP. Thirdly, an optimization subroutine is embedded in the proposed deep RL algorithm to directly obtain the optimal edge computing resource allocation, thereby expediting the training process. Simulation results demonstrate that the proposed algorithm can reduce the average service delay and preserve long-term inference accuracy with a high probability.KeywordsDNN inferenceMobile edge computingReinforcement learningLyapunov optimizationConstrained Markov decision processAdaptive rate sampling

Collaborative Deep Neural Network Inference via Mobile Edge Computing