Abstract:Imitation learning is a widely-used paradigm for decision making that learns from expert demonstrations. Existing imitation algorithms often require multiple interactions between the agent and the environment from which the demonstration is obtained. The acquisition of expert demonstrations in simulator usually requires specialized knowledge. In addition, real-world interactions are limited due to security or cost concerns. Therefore, the direct application of existing imitation learning algorithms in either real world or simulator is not an ideal strategy. In this paper, we propose a cross-domain Inverse Reinforcement Learning training paradigm that learns a reward function from hetero-domain expert’s demonstration, while the interaction with the environment that obtains demonstrations should be limited. In order to solve the distribution shift under such paradigm, we propose a transfer learning method called off-dynamics Inverse Reinforcement Learning. The intuition behind off-dynamics Inverse Reinforcement Learning is that the goal of reward function learning is not only to imitate experts, but also to promote action adaptation to the dynamic difference between two hetero-domain. Specifically, a widely-used Inverse Reinforcement Learning framework was adopted, and its discriminator for identifying agent-generated trajectories was modified with quantified dynamic differences. The training process of the discriminator yields the transferable reward function suitable for the target dynamics, which is guaranteed by our theoretical derivation. Off-dynamics Inverse Reinforcement Learning assigns higher rewards to demonstration trajectories that do not exploit discrepancies between the two domains. Our method demonstrates its effectiveness and scalability to high-dimensional tasks through extensive experiments on continuous control tasks. Our code is available on the project website: https://github.com/yachenkang/ODIRL.

Data-Incremental Continual Offline Reinforcement Learning

Off-Dynamics Inverse Reinforcement Learning

OER: Offline Experience Replay for Continual Offline Reinforcement Learning

Solving Continual Offline Reinforcement Learning with Decision Transformer

Forget but Recall: Incremental Latent Rectification in Continual Learning

Replay-enhanced Continual Reinforcement Learning

Overcoming Domain Drift in Online Continual Learning

Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Continual Learning Through Retrieval and Imagination.

ROER: Regularized Optimal Experience Replay

Contrastive Correlation Preserving Replay for Online Continual Learning

Residual Learning and Context Encoding for Adaptive Offline-to-Online Reinforcement Learning

Squeezing More Past Knowledge for Online Class-Incremental Continual Learning

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

Improving Plasticity in Online Continual Learning via Collaborative Learning

Curriculum Offline Reinforcement Learning

Rehearsal-free Federated Domain-incremental Learning

AOCIL: Exemplar-free Analytic Online Class Incremental Learning with Low Time and Resource Consumption

ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update