Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Adaptive Generative Adversarial Maximum Entropy Inverse Reinforcement Learning

Off-Dynamics Inverse Reinforcement Learning

AdaBoost Maximum Entropy Deep Inverse Reinforcement Learning with Truncated Gradient

Maximum Entropy Reinforcement Learning with Evolution Strategies

Adversarial Imitation via Variational Inverse Reinforcement Learning

When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Auto-Encoding Adversarial Imitation Learning

Improve generated adversarial imitation learning with reward variance regularization

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate

Exploring Gradient Explosion in Generative Adversarial Imitation Learning: A Probabilistic Perspective

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback

Generative Adversarial Imagination for Sample Efficient Deep Reinforcement Learning

Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

An Effective Maximum Entropy Exploration Approach for Deceptive Game in Reinforcement Learning.

Generative Adversarial Exploration for Reinforcement Learning

Inverse Reinforcement Learning by Estimating Expertise of Demonstrators

Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

Adaptive Language-Guided Abstraction from Contrastive Explanations

On Computation and Generalization of Generative Adversarial Imitation Learning.