MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Zohar Rimon,Tom Jurgenson,Orr Krupnik,Gilad Adler,Aviv Tamar
2024-03-15
Abstract:Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional task distributions. In parallel, model-based RL methods have been successful in solving partially observable MDPs, of which meta-RL is a special case. In this work, we leverage this success and propose a new model-based approach to meta-RL, based on elements from existing state-of-the-art model-based and meta-RL methods. We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to $15\times$) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of improving sample efficiency and task generalization in meta-reinforcement learning (meta-RL). Existing meta-reinforcement learning algorithms suffer from low sample efficiency when dealing with complex domains that require efficient exploration and are mostly focused on low-dimensional task distributions. Meanwhile, model-based reinforcement learning methods have achieved success in solving partially observable Markov decision processes (POMDPs), and meta-reinforcement learning can be viewed as a special case of POMDPs. To overcome these challenges, the paper proposes a new model-based approach—MAMBA (MetA-RL Model-Based Algorithm), which combines elements of state-of-the-art model-based and meta-reinforcement learning methods. MAMBA demonstrates higher returns and better sample efficiency (up to 15 times) on common meta-reinforcement learning benchmark domains, with almost no need for hyperparameter tuning. Additionally, MAMBA has been validated on a series of more challenging high-dimensional task distributions, taking an important step towards achieving agents that can generalize to real-world scenarios. Specifically, the main contributions of the paper include: 1. Proposing a unified formulation that combines context-based meta-reinforcement learning algorithms with the Dreamer series of works, highlighting their similarities and differences. 2. Developing MAMBA, a sample-efficient meta-reinforcement learning method that outperforms various baseline methods on multiple meta-reinforcement learning benchmarks, with almost no need for hyperparameter tuning. 3. Describing a class of meta-reinforcement learning environments with many degrees of freedom, which are difficult for existing meta-reinforcement learning methods to solve. The paper provides a theoretical analysis of these domains and demonstrates MAMBA's ability to efficiently solve these problems. Through these contributions, the paper not only improves the performance of meta-reinforcement learning but also provides a new path for solving complex tasks in the real world.