The Guiding Role of Reward Based on Phased Goal in Reinforcement Learning.

Yiming Liu,Zheng Hu
DOI: https://doi.org/10.1145/3383972.3384039
2020-01-01
Abstract:Sparse and delayed rewards have greatly hindered the deep reinforcement learning, which is supposed to acquire the optimal policy by learning from trajectories. Reward shaping, which has previously been introduced to accelerate learning, is one of the most effective methods to tackle this crucial yet challenging problem. However, how to reasonably implement reward shaping needs to be explored. Currently, the method of reward shaping usually requires a large number of expert demonstrations, and the environment is poorly explored. In this paper, we proposed a method of reward shaping---Reinforcement learning framework based on phased goal, which will accelerate learning convergence speed with less expert examples and explore better especially for tasks where environment rewards are particularly sparse. The framework consists of reward based on phased goal and policy learning using PPO2. The process of acquiring designed reward is divided into stage classification and calculation of goal proximity. Experiments proved that our method can effectively alleviate the problem of sparse reward and obtain higher scores in Atari game than basic algorithm.
What problem does this paper attempt to address?