Abstract:Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional task distributions. In parallel, model-based RL methods have been successful in solving partially observable MDPs, of which meta-RL is a special case. In this work, we leverage this success and propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods. We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to $15\times$) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.

What problem does this paper attempt to address?

The paper attempts to address the problem of improving sample efficiency and task generalization in meta-reinforcement learning (meta-RL). Existing meta-reinforcement learning algorithms suffer from low sample efficiency when dealing with complex domains that require efficient exploration and are mostly focused on low-dimensional task distributions. Meanwhile, model-based reinforcement learning methods have achieved success in solving partially observable Markov decision processes (POMDPs), and meta-reinforcement learning can be viewed as a special case of POMDPs. To overcome these challenges, the paper proposes a new model-based approach—MAMBA (MetA-RL Model-Based Algorithm), which combines elements of state-of-the-art model-based and meta-reinforcement learning methods. MAMBA demonstrates higher returns and better sample efficiency (up to 15 times) on common meta-reinforcement learning benchmark domains, with almost no need for hyperparameter tuning. Additionally, MAMBA has been validated on a series of more challenging high-dimensional task distributions, taking an important step towards achieving agents that can generalize to real-world scenarios. Specifically, the main contributions of the paper include: 1. Proposing a unified formulation that combines context-based meta-reinforcement learning algorithms with the Dreamer series of works, highlighting their similarities and differences. 2. Developing MAMBA, a sample-efficient meta-reinforcement learning method that outperforms various baseline methods on multiple meta-reinforcement learning benchmarks, with almost no need for hyperparameter tuning. 3. Describing a class of meta-reinforcement learning environments with many degrees of freedom, which are difficult for existing meta-reinforcement learning methods to solve. The paper provides a theoretical analysis of these domains and demonstrates MAMBA's ability to efficiently solve these problems. Through these contributions, the paper not only improves the performance of meta-reinforcement learning but also provides a new path for solving complex tasks in the real world.

MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Model-based Adversarial Meta-Reinforcement Learning

A Survey of Meta-Reinforcement Learning

Harmony World Models: Boosting Sample Efficiency for Model-based Reinforcement Learning

Model-Based Reinforcement Learning via Meta-Policy Optimization

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Constrained Meta Agnostic Reinforcement Learning

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

Learn to Effectively Explore in Context-Based Meta-RL

First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs

AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

World Models Increase Autonomy in Reinforcement Learning

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

NoRML: No-Reward Meta Learning

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

Intrinsically Guided Exploration in Meta Reinforcement Learning

Exploration With Task Information for Meta Reinforcement Learning

MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models

Unsupervised Meta-Learning for Reinforcement Learning