Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm

Off-Dynamics Inverse Reinforcement Learning

Dynamic Multiobjective Optimization Driven by Inverse Reinforcement Learning.

An improved manta ray foraging optimization algorithm

Curricular Subgoals for Inverse Reinforcement Learning

A reinforcement learning-based hybrid Aquila Optimizer and improved Arithmetic Optimization Algorithm for global optimization

Leveraging Large Language Model to Generate a Novel Metaheuristic Algorithm with CRISPE Framework

An efficient chaotic mutative moth-flame-inspired optimizer for global optimization tasks

An Efficient Unified Approach Using Demonstrations for Inverse Reinforcement Learning

Hybridization of evolutionary algorithm and deep reinforcement learning for multi-objective orienteering optimization

A reinforcement learning brain storm optimization algorithm (BSO) with learning mechanism

An Improved Teaching-Learning-Based Optimization Algorithm with Reinforcement Learning Strategy for Solving Optimization Problems

Imperialist competition algorithm with quasi-opposition-based learning for function optimization and engineering design problems

A multi-swarm optimizer with a reinforcement learning mechanism for large-scale optimization

Meta-Learning-Based Deep Reinforcement Learning for Multiobjective Optimization Problems

The marriage of operations research and reinforcement learning: Integration of NEH into Q-learning algorithm for the permutation flowshop scheduling problem

Deep Reinforcement Learning for Multiobjective Optimization

Recursive logit-based meta-inverse reinforcement learning for driver-preferred route planning

A Hybrid Equilibrium Optimizer Based on Moth Flame Optimization Algorithm to Solve Global Optimization Problems

A reinforcement learning approach for dynamic multi-objective optimization