Improving Policy Generalization for Teacher-Student Reinforcement Learning.

Gong Xudong,Jia Hongda,Zhou Xing,Feng Dawei,Ding Bo,Xu Jie
DOI: https://doi.org/10.1007/978-3-030-55393-7_4
2020-01-01
Abstract:Teacher-student reinforcement learning is a popular approach that aims to accelerate the learning of new agents with advice from trained agents. In these methods, budgets are introduces to limit the amount of advice to prevent over-advising. However, existing budget-based methods tend to use up budgets in the early training stage to help students learn initial policies fast. As a result, initial policies are some kind solidified, which is not beneficial for improving policy generalization. In this paper, to overcome advising intensively in the early training stage, we enable advising in the entire training stage in a decreasing way. Specifically, we integrate advice into reward signals and propose an advice-based extra reward method, and integrate advice into exploration strategies and propose an advice-based modified epsilon method. Experimental results show that the proposed methods can effectively improve the policy performance on general tasks, without loss of learning speed.
What problem does this paper attempt to address?