Learning To Walk With Prior Knowledge

Martin Gottwald,Dominik Meyer,Hao Shen,Klaus Diepold
DOI: https://doi.org/10.1109/AIM.2017.8014209
2017-01-01
Abstract:In this work a novel approach to Transfer Learning for the use in Deep Reinforcement Learning is introduced. The agent is realized as an actor-critic framework, namely the Deep Deterministic Policy Gradient algorithm. The Q-function and the policy are represented as deep feed-forward networks, that are trained by minimizing the mean squared Bellman error and by maximizing the expected reward, respectively. For Transfer Learning, the actor is modified with a new regularization term, called the knowledge regularizer. It allows to include prior knowledge in from of an existing policy in the learning process. The knowledge regularizer shifts the current weight vector during the gradient descent step towards a region of the weight space, that is centered around the existing policy. Because neural networks are universal and smooth function approximators, the weights of the existing policy and the new ones have to lie close to each other in the weight space. Solving a task therefore benefits from the prior knowledge, when it is used to manipulate the gradient given by the critic.We could experimentally verify, that the knowledge regularizer results in a higher performance achieved by the agent and in a reduction of the learning time. Furthermore, the knowledge regularizer can be used as a replacement for labeled training data, which renders it especially useful for physical applications.
What problem does this paper attempt to address?