Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Distributional Cloning for Stabilized Imitation Learning via ADMM

Off-Dynamics Inverse Reinforcement Learning

ADR-BC: Adversarial Density Weighted Regression Behavior Cloning

Distributionally Robust Behavioral Cloning for Robust Imitation Learning

Diffusion Model-Augmented Behavioral Cloning

Adversarial Imitation Learning via Boosting

Adversarial imitation learning with mixed demonstrations from multiple demonstrators

Efficient Off-policy Adversarial Imitation Learning with Imperfect Demonstrations

C3DM: Constrained-Context Conditional Diffusion Models for Imitation Learning

On Generalization of Adversarial Imitation Learning and Beyond

Distributional generative adversarial imitation learning with reproducing kernel generalization

SS-MAIL: Self-Supervised Multi-Agent Imitation Learning

No Need for Interactions: Robust Model-Based Imitation Learning using Neural ODE

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

DiffAIL: Diffusion Adversarial Imitation Learning

State-only Imitation with Transition Dynamics Mismatch

Stable-BC: Controlling Covariate Shift with Stable Behavior Cloning