Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Introspective Action Advising for Interpretable Transfer Learning

Off-Dynamics Inverse Reinforcement Learning

Action Advising with Advice Imitation in Deep Reinforcement Learning

Learning Action-Transferable Policy with Action Embedding

Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning

Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer

Transfer with Action Embeddings for Deep Reinforcement Learning

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Deep Reinforcement Learning for Autonomous Driving by Transferring Visual Features.

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Transferring Domain Knowledge with an Adviser in Continuous Tasks

Transfer Learning for Efficient Iterative Safety Validation

Enabling Inter-Agent Transfer for Multi-Agent Learning System by Incorporating Role Reversal

Shaping Progressive Net of Reinforcement Learning for Policy Transfer with Human Evaluative Feedback

Efficient Deep Reinforcement Learning Via Adaptive Policy Transfer

Efficient Deep Reinforcement Learning Through Policy Transfer.

Driving Tasks Transfer in Deep Reinforcement Learning for Decision-making of Autonomous Vehicles

Transfer in Deep Reinforcement Learning using Knowledge Graphs

Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning