Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Reinforcement Learning from Demonstration and Human Reward

Improving Interactive Reinforcement Agent Planning with Human Demonstration

Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain

Off-Dynamics Inverse Reinforcement Learning

Towards Learning from Implicit Human Reward: (extended Abstract)

GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

Transferring knowledge from human-demonstration trajectories to reinforcement learning

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

A Large-Scale Study of Agents Learning from Human Reward

Demonstration actor critic

Model-based Adversarial Imitation Learning from Demonstrations and Human Reward

An Efficient Unified Approach Using Demonstrations for Inverse Reinforcement Learning

Facial feedback for reinforcement learning: a case study and offline analysis using the TAMER framework

Reinforcement Learning via Reasoning from Demonstration

Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data

Continuous Reinforcement Learning From Human Demonstrations With Integrated Experience Replay For Autonomous Driving

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Learning Traffic Signal Control from Demonstrations

Distance-rank Aware Sequential Reward Learning for Inverse Reinforcement Learning with Sub-optimal Demonstrations

A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space

Automata Guided Reinforcement Learning With Demonstrations