Delay-Aware Stochastic Resource Management for Mobile Edge Computing Systems Via Constrained Reinforcement Learning

Chang Tian,An Liu,Wu Luo
DOI: https://doi.org/10.1109/lwc.2021.3112984
IF: 6.3
2021-01-01
IEEE Wireless Communications Letters
Abstract:We design a joint radio and computational resource allocation policy for a multi-user mobile edge computing system, such that the expected power consumption is minimized while satisfying long-term delay constraints. The problem is formulated as a constrained Markov decision process (CMDP) that is efficiently solved by the proposed constrained reinforcement learning (CRL) algorithm, called successive convex programming based policy optimization (SCPPO). SCPPO solves a convex objective/feasibility surrogate problem at each update and it can provably converge to a Karush-Kuhn-Tucker (KKT) point of the original CMDP problem almost surely under some mild conditions. Moreover, SCPPO adopts an application-specific policy architecture and employs a data-efficient estimation strategy that can reuse old experiences, such that SCPPO can realize fast learning with low computational complexity.
What problem does this paper attempt to address?