Self-Guided Actor-Critic: Reinforcement Learning from Adaptive Expert Demonstrations

Haoran Zhang,Chenkun Yin,Yanxin Zhang,Shangtai Jin
DOI: https://doi.org/10.1109/cdc42340.2020.9304004
2020-01-01
Abstract:This paper develops a novel Reinforcement Learning from Demonstration (RLfD) algorithm, called Self-Guided Actor-Critic (SGAC), with the purpose of enhancing exploration of Reinforcement Learning by imitating expert policies from demonstration data. Instead of using human experts or other control algorithms as demonstrations in most of existing methods, SGAC can generate high quality expert data online by continuously querying and adaptively updating an expert that is given by model predictive Deep Deterministic Policy Gradient (MP-DDPG). In this way, the training cost of SGAC is reduced, and distribution mismatch problem leading to unstable learning process is alleviated. In addition, the optimality assumption of the expert in typical RLfD methods is relaxed since the adaptive expert of SGAC can make self-improvement. A non-trivial example of applying SGAC to ship berthing control problem is present. The simulation results show the learning process of SGAC is faster and steadier than typical Reinforcement Learning algorithms and MP-DDPG.
What problem does this paper attempt to address?