Abstract:Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals, where the agent can attain a reward by achieving any one of a large set of objectives, all specified at test time. Finally, to provide theoretical support, we give examples of classes of environments where (under some assumptions) standard off-policy algorithms such as DDPG require at least O(d^2) replay buffer transitions to learn an optimal policy, while our proposed technique requires only O(d) transitions, where d is the dimensionality of the goal and state space. Code is available at <a class="link-external link-https" href="https://github.com/alevine0/ReenGAGE" rel="external noopener nofollow">this https URL</a>.

Learning To Walk With Prior Knowledge

Shaping in Reinforcement Learning Via Knowledge Transferred from Human-Demonstrations

Transferring knowledge from human-demonstration trajectories to reinforcement learning

Transfer in Deep Reinforcement Learning using Knowledge Graphs

Shaping in Reinforcement Learning by Knowledge Transferred from Human-Demonstrations of a Simple Similar Task.

Accelerating deep reinforcement learning via knowledge-guided policy network

Mutual Information Based Knowledge Transfer Under State-Action Dimension Mismatch

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Injecting Prior Knowledge for Transfer Learning into Reinforcement Learning Algorithms using Logic Tensor Networks

Exploration in Knowledge Transfer Utilizing Reinforcement Learning

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Bayesian Transfer Reinforcement Learning with Prior Knowledge Rules

Goal-Conditioned Q-Learning as Knowledge Distillation

Introspective Action Advising for Interpretable Transfer Learning

Knowledge Transfer in Deep Reinforcement Learning via an RL-Specific GAN-Based Correspondence Function

DGTRL: Deep graph transfer reinforcement learning method based on fusion of knowledge and data

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery

A Framework for Few-Shot Policy Transfer through Observation Mapping and Behavior Cloning

Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning

Learning state correspondence of reinforcement learning tasks for knowledge transfer