Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Online Observer-Based Inverse Reinforcement Learning

Off-Dynamics Inverse Reinforcement Learning

Convergence Analysis of an Incremental Approach to Online Inverse Reinforcement Learning

Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning

A Framework and Method for Online Inverse Reinforcement Learning

Online inverse reinforcement learning with unknown disturbances

Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems Using Online Approximators

Model-based inverse reinforcement learning for deterministic systems

Inverse reinforcement learning by expert imitation for the stochastic linear-quadratic optimal control problem

Hybrid Inverse Reinforcement Learning

Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes

Online Output-Feedback Parameter and State Estimation for Second Order Linear Systems

A Bayesian Approach to Robust Inverse Reinforcement Learning

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Towards Theoretical Understanding of Inverse Reinforcement Learning

Inverse Value Iteration and Q -Learning: Algorithms, Stability, and Robustness

A Differential Dynamic Programming Framework for Inverse Reinforcement Learning

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems

How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach

Adaptive Observation-Based Efficient Reinforcement Learning for Uncertain Systems