Reinforcement-learning-based Wireless Resource Allocation

Rui Wang
DOI: https://doi.org/10.1049/pbte081e_ch11
2019-01-01
Abstract:In this chapter, we shall focus on the formulation of radio resource management via Markov decision process (MDP). Convex optimization has been widely used in the RRM within a short-time duration, where the wireless channel is assumed to be quasi-static. These problems are usually referred to as deterministic optimization problems. On the other hand, MDP is an elegant and powerful tool to handle the resource optimization of wireless systems in a longer timescale, where the random transitions of system and channel status are considered.These problems are usually referred to as stochastic optimization problems. Particularly, MDP is suitable for the joint optimization between physical and media-access control (MAC) layers. Based on MDP, reinforcement learning is a practical method to address the optimization without a priori knowledge of system statistics. In this chapter, we shall first introduce some basics on stochastic approximation, which serves as one basis of reinforcement learning, and then demonstrate the MDP formulations of RRM via some case studies, which require the knowledge of system statistics. Finally, some approaches of reinforcement learning (e.g., Q-learning) are introduced to address the practical issue of unknown system statistics.
What problem does this paper attempt to address?