Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Aligning Human Intent from Imperfect Demonstrations with Confidence-based Inverse soft-Q Learning

Off-Dynamics Inverse Reinforcement Learning

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

Robot Learning from Human Demonstrations with Inconsistent Contexts

MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

Imitation Learning from Purified Demonstrations

Contrast, Imitate, Adapt: Learning Robotic Skills From Raw Human Videos

Learning Feasibility to Imitate Demonstrators with Different Dynamics

Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Pretrain Soft Q-Learning with Imperfect Demonstrations.

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards

Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning

How to Leverage Diverse Demonstrations in Offline Imitation Learning

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms

Limited Preference Aided Imitation Learning from Imperfect Demonstrations

SWBT: Similarity Weighted Behavior Transformer with the Imperfect Demonstration for Robotic Manipulation