Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Opponent Cart-Pole Dynamics for Reinforcement Learning of Competing Agents

Off-Dynamics Inverse Reinforcement Learning

Ancillary Mechanism for Autonomous Decision-Making Process in Asymmetric Confrontation: a View from Gomoku

Large Scale Pursuit-Evasion under Collision Avoidance Using Deep Reinforcement Learning.

Opponent Modeling in Deep Reinforcement Learning

Achieving Correlated Equilibrium by Studying Opponent's Behavior Through Policy-Based Deep Reinforcement Learning

Adversarial Active Exploration for Inverse Dynamics Model Learning

Modeling and Control Architecture for the Competitive Networked Robot System Based on POMDP

Modeling opponent learning in multiagent repeated games

Influencing Towards Stable Multi-Agent Interactions

Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games

Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation

Multi-Agent Combat in Non-Stationary Environments

Adversarial Decision-Making for Moving Target Defense: A Multi-Agent Markov Game and Reinforcement Learning Approach

All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization

Opponent portrait for multiagent reinforcement learning in competitive environment

A Deep Reinforcement Learning-Based Method Applied for Solving Multi-Agent Defense and Attack Problems.

Learning to Model Opponent Learning

Differential Game-Based Deep Reinforcement Learning in Underwater Target Hunting Task

Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

Evolutionary Game Dynamics of Multi-Agent Cooperation Driven by Self-Learning