Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Cross-domain policy adaptation with dynamics alignment

Off-Dynamics Inverse Reinforcement Learning

Cross-Domain Policy Adaptation by Capturing Representation Mismatch

Cross Domain Policy Transfer with Effect Cycle-Consistency

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

Cross-Domain Policy Transfer by Representation Alignment via Multi-Domain Behavioral Cloning

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing

Cross-Modal Domain Adaptation for Reinforcement Learning

Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning

Cross-Modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning

Cross-Domain Communications Between Agents Via Adversarial-Based Domain Adaptation in Reinforcement Learning

Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Sim-to-Real Policy and Reward Transfer with Adaptive Forward Dynamics Model

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Prototypical context-aware dynamics generalization for high-dimensional model-based reinforcement learning

A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents

Dynamics-Aware Adaptation for Reinforcement Learning Based Cross-Domain Interactive Recommendation

Quantification Before Selection: Active Dynamics Preference for Robust Reinforcement Learning

AdaPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems