Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery

Off-Dynamics Inverse Reinforcement Learning

Globally Stable Neural Imitation Policies

Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy For Visuomotor Imitation Learning

End-to-End Stable Imitation Learning via Autonomous Neural Dynamic Policies

Robust Imitation of a Few Demonstrations with a Backwards Model

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Learning Lyapunov-Stable Polynomial Dynamical Systems Through Imitation

Efficient Imitation Learning with Conservative World Models

Off-Policy Imitation Learning from Observations

Robot Policy Improvement With Natural Evolution Strategies for Stable Nonlinear Dynamical System

Dynamic Regret Convergence Analysis and an Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning

LSTM Learn Policy from Dynamical System of Demonstration Motions for Robot Imitation Learning

Offline Imitation Learning with a Misspecified Simulator.

Minimax Iterative Dynamic Game: Application to Nonlinear Robot Control Tasks

Iterative Regularized Policy Optimization with Imperfect Demonstrations

Improved Policy Optimization for Online Imitation Learning

Robust Visual Imitation Learning with Inverse Dynamics Representations

Online Adaptation for Enhancing Imitation Learning Policies

DropoutDAgger: A Bayesian Approach to Safe Imitation Learning

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments