Deep Deterministic Policy Gradient with Prioritized Sampling for Power Control

Shiyang Zhou,Yufan Cheng,Xia Lei,Huanhuan Duan
DOI: https://doi.org/10.1109/access.2020.3033333
IF: 3.9
2020-01-01
IEEE Access
Abstract:Reinforcement learning is a technique for power control in wireless communications. However, most research has focused on the deep Q-network (DQN) scheme, which outputs the Q-value for each discrete action, and does not match the continuous power control problem. Hence, this paper provides a deep deterministic policy gradient (DDPG) scheme for power control. A power selection policy designated an actor is approximated by a convolutional neural network (CNN), and an evaluation of a policy designated a critic is approximated by a fully connected network. These deep neural networks enable fast decision-making for large-scale power control problems. Moreover, to speed up the training process, this paper proposes a prioritized sampling technique, which samples the experiences that need to be learned with a higher probability. This paper simulates the proposed algorithm in a multiple sweep interference (MSI) scenario. The simulation results show that the DDPG scheme is more likely to achieve optimal policy than the DQN scheme. In addition, the DDPG scheme with prioritized sampling (DDPG-PS) converges faster than the DDPG scheme with uniform sampling.
What problem does this paper attempt to address?