Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

Off-Dynamics Inverse Reinforcement Learning

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Intrinsically Guided Exploration in Meta Reinforcement Learning

Exploration With Task Information for Meta Reinforcement Learning

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Enhancing Context-Based Meta-Reinforcement Learning Algorithms Via An Efficient Task Encoder (Student Abstract)

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments

MAML2: meta reinforcement learning via meta-learning for task categories

Predictive value of a positive exercise stress testing and correlations with cardiovascular risk factors.

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

Reward Shaping via Meta-Learning

Learning and Fast Adaptation for Air Combat Decision with Improved Deep Meta-reinforcement Learning

Reinforcement Teaching

Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning