Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

New Method of Hierarchical Reinforcement Learning

Hierarchical reinforcement learning with unlimited option scheduling for sparse rewards in continuous spaces

Off-Dynamics Inverse Reinforcement Learning

Autonomous Discovery and Creation of Options in Hierarchical Reinforcement Learning

Option-Based Hierarchical Reinforcement Learning for UAV Multi-Objective Path Planning

Multi-Level Discovery of Deep Options

Online Baum-Welch algorithm for Hierarchical Imitation Learning

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

An agent with a sense of direction for option discovery in hierarchical reinforcement learning

Adversarial Option-Aware Hierarchical Imitation Learning.

Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning

Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration

A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Hierarchical Reinforcement Learning in Complex 3D Environments

MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning

Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes

Algorithms for Batch Hierarchical Reinforcement Learning

HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with Dual Coordination Mechanism

State Abstraction in MAXQ Hierarchical Reinforcement Learning