Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network

Wenjia Meng,Qian Zheng,Long Yang,Pengfei Li,Gang Pan
DOI: https://doi.org/10.1109/tnnls.2019.2948892
IF: 14.255
2020-10-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. The DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this brief, we propose a general framework to combine the DQN and most of the return-based reinforcement learning algorithms, named R-DQN. We show that the performance of the traditional DQN can be significantly improved by introducing return-based algorithms. In order to further improve the R-DQN, we design a strategy with two measurements to qualitatively measure the policy discrepancy. We conduct experiments on several representative tasks from the OpenAI Gym and Atari games. The state-of-the-art performance achieved by our method with this proposed strategy validates its effectiveness.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?