A Memory-Greedy Policy with Guaranteed Convergence for Accelerating Reinforcement Learning

Xinglin Yu,Yuhu Wu,Xi-Ming Sun,Wenya Zhou
DOI: https://doi.org/10.1115/1.4049539
2021-01-01
Abstract:Abstract Balancing the exploration and exploitation in reinforcement learning is a commonly dilemma and time-wasting work. In this paper, a novel exploration policy used in Q-Learning, called Memory-greedy policy, is proposed to accelerate learning. By memory storage and playback, the probability of random action selecting can be effectively dealt with or reduced, which hence speeds up learning. The principle of this policy is analyzed by maze scene, and the theoretical convergence is given according to dynamic programming.
What problem does this paper attempt to address?