Abstract:Reinforcement learning (RL) enables sequential decision-making in complex and high-dimensional environments through interaction with the environment. In most real-world applications, however, a high number of interactions are infeasible. In these environments, transfer RL algorithms, which can be used for the transfer of knowledge from one or multiple source environments to a target environment, have been shown to increase learning speed and improve initial and asymptotic performance. However, most existing transfer RL algorithms are on-policy and sample inefficient, fail in adversarial target tasks, and often require heuristic choices in algorithm design. This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments. Its novelty is in using the popular notion of ``advantage'' as a regularizer, to weigh the knowledge that should be transferred from the source, relative to new knowledge learned in the target, removing the need for heuristic choices. Further, we propose a new transfer performance measure to evaluate the performance of our algorithm and unify existing transfer RL frameworks. Finally, we present a scalable, theoretically-backed task similarity measurement algorithm to illustrate the alignments between our proposed transferability measure and similarities between source and target environments. We compare APT-RL with several baselines, including existing transfer-RL algorithms, in three high-dimensional continuous control tasks. Our experiments demonstrate that APT-RL outperforms existing transfer RL algorithms and is at least as good as learning from scratch in adversarial tasks.

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Efficient Deep Reinforcement Learning Through Policy Transfer.

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Efficient Deep Reinforcement Learning Via Adaptive Policy Transfer

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

Deep Reinforcement Learning for Autonomous Driving by Transferring Visual Features.

Efficient Exploration for Multi-Agent Reinforcement Learning Via Transferable Successor Features

Successor Feature Neural Episodic Control

Safety-Constrained Policy Transfer with Successor Features

Shaping in Reinforcement Learning Via Knowledge Transferred from Human-Demonstrations

Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning

Policy Transfer Via Skill Adaptation and Composition

Grounding Language for Transfer in Deep Reinforcement Learning

An advantage based policy transfer algorithm for reinforcement learning with measures of transferability

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

Learning when to Transfer among Agents: an Efficient Multiagent Transfer Learning Framework.

A Framework for Few-Shot Policy Transfer through Observation Mapping and Behavior Cloning

Self-Supervised Reinforcement Learning that Transfers using Random Features

A Transfer Approach Using Graph Neural Networks in Deep Reinforcement Learning

Combining Behaviors with the Successor Features Keyboard