Abstract:Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes Model-based Adversarial Meta-Reinforcement Learning (AdMRL), where we aim to minimize the worst-case sub-optimality gap -- the difference between the optimal return and the return that the algorithm achieves after adaptation -- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the adversarial task for the current model -- the task for which the policy induced by the model is maximally suboptimal. Assuming the family of tasks is parameterized, we derive a formula for the gradient of the suboptimality with respect to the task parameters via the implicit function theorem, and show how the gradient estimator can be efficiently implemented by the conjugate gradient method and a novel use of the REINFORCE estimator. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks, the generalization power to out-of-distribution tasks, and in training and test time sample efficiency, over existing state-of-the-art meta-RL algorithms.

Hyper-Meta Reinforcement Learning with Sparse Reward

HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Meta Reinforcement Learning with Task Embedding and Shared Policy.

MGHRL: Meta Goal-generation for Hierarchical Reinforcement Learning

A Survey of Meta-Reinforcement Learning

Black box meta-learning intrinsic rewards for sparse-reward environments

NoRML: No-Reward Meta Learning

Hierarchical Multi-Agent Reinforcement Learning for Cooperative Tasks with Sparse Rewards in Continuous Domain

Intrinsically Guided Exploration in Meta Reinforcement Learning

Learn to Effectively Explore in Context-Based Meta-RL

Contextual Policy Transfer in Meta-Reinforcement Learning via Active Learning.

Introducing Symmetries to Black Box Meta Reinforcement Learning

Hierarchical Meta-Reinforcement Learning via Automated Macro-Action Discovery

Meta Learning Shared Hierarchies

Reward Shaping via Meta-Learning

Meta-Reinforcement Learning with Dynamic Adaptiveness Distillation

Meta-Learning Integration in Hierarchical Reinforcement Learning for Advanced Task Complexity

Model-based Adversarial Meta-Reinforcement Learning