Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Optimizing High-dimensional Learner with Low-Dimension Action Features

Off-Dynamics Inverse Reinforcement Learning

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Deep Reinforcement Learning Based Co-Optimization of Morphology and Gait for Small-Scale Legged Robot

Learning Hierarchical Behavior and Motion Planning for Autonomous Driving.

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning

Discovering Synergies for Robot Manipulation with Multi-Task Reinforcement Learning

Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Efficient Intrinsically Motivated Robotic Grasping with Learning-Adaptive Imagination in Latent Space

HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation

Motion Sequence Learning for Robot Walking Based on Pose optimization

Real-World Dexterous Object Manipulation based Deep Reinforcement Learning

Human Motor Learning Dynamics in High-dimensional Tasks

High-Dimensional Controller Tuning through Latent Representations

Bi-Level Motion Imitation for Humanoid Robots

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

A High-Efficient Reinforcement Learning Approach for Dexterous Manipulation

Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers

Model-based reinforcement learning with dimension reduction

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations