Periodic Guidance Learning

Lipeng Wan,Xuguang Lan,Xuwei Song,Chuzhen Feng,Nanning Zheng
DOI: https://doi.org/10.1109/ICBK50248.2020.00021
2020-01-01
Abstract:Tasks with periodic states are widespread in reality. However, Current reinforcement learning (RL) algorithms generally treat such tasks as non-periodic Markov decision process, which results in low exploration efficiency and misleading advantage estimation with high variance. This paper proposes periodic guidance learning (PGL), in which a pruned advantage estimation with lower variance is implemented. Meanwhile, based on periodic states, past good experiences are utilized for better exploration. Our algorithm is evaluated on periodic tasks in MuJoCo. The experimental results show PGL method improves exploration efficiency and outperforms baselines in various periodic tasks. The results also show that PGL achieves a smooth policy optimization. Further experiments on the agent's periodic behavior reveal the strong correlation between period length and the agents motion mode.
What problem does this paper attempt to address?