Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Shaping in Reinforcement Learning Via Knowledge Transferred from Human-Demonstrations

Shaping in Reinforcement Learning by Knowledge Transferred from Human-Demonstrations of a Simple Similar Task.

Transferring knowledge from human-demonstration trajectories to reinforcement learning

Off-Dynamics Inverse Reinforcement Learning

DGTRL: Deep graph transfer reinforcement learning method based on fusion of knowledge and data

Knowledge Sharing and Transfer via Centralized Reward Agent for Multi-Task Reinforcement Learning

Learning To Walk With Prior Knowledge

Mutual Information Based Knowledge Transfer Under State-Action Dimension Mismatch

Shaping Reward Learning Approach from Passive Samples

Hierarchical Reinforcement Learning from Demonstration via Reachability-Based Reward Shaping

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

A new Potential-Based Reward Shaping for Reinforcement Learning Agent

Efficient Deep Reinforcement Learning Through Policy Transfer.

ACQUISITION SLOPE SURFACE WALKING FOR HUMANOIDS VIA TRANSFER LEARNING

Improved Reinforcement Learning in Cooperative Multi-agent Environments Using Knowledge Transfer

A Framework for Few-Shot Policy Transfer through Observation Mapping and Behavior Cloning

Efficient Deep Reinforcement Learning Via Adaptive Policy Transfer

Foundations for Transfer in Reinforcement Learning: A Taxonomy of Knowledge Modalities

Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations

KnowRU: Knowledge Reusing via Knowledge Distillation in Multi-agent Reinforcement Learning