Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration

Off-Dynamics Inverse Reinforcement Learning

Model-Based Inverse Reinforcement Learning from Visual Demonstrations

Model Predictive Optimization for Imitation Learning from Demonstrations.

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations

A Dual Approach to Imitation Learning from Observations with Offline Datasets

Learning Feasibility to Imitate Demonstrators with Different Dynamics

RILe: Reinforced Imitation Learning

Learning from Demonstration Framework for Multi-Robot Systems Using Interaction Keypoints and Soft Actor-Critic Methods

RLIF: Interactive Imitation Learning as Reinforcement Learning

Reinforcement Learning-based Learning from Demonstrations for Collaborative Robots

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

A Differential Dynamic Programming Framework for Inverse Reinforcement Learning

RoCoDA: Counterfactual Data Augmentation for Data-Efficient Robot Learning from Demonstrations

Demonstration Guided Actor-Critic Deep Reinforcement Learning for Fast Teaching of Robots in Dynamic Environments

Extended Reality System for Robotic Learning from Human Demonstration

GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

Inverse reinforcement learning for dexterous hand manipulation

Learning to Solve Tasks with Exploring Prior Behaviours

Imitation-Enhanced Reinforcement Learning with Privileged Smooth Transition for Hexapod Locomotion