XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX

Alexander Nikulin,Vladislav Kurenkov,Ilya Zisman,Artem Agarkov,Viacheslav Sinii,Sergey Kolesnikov
2024-06-10
Abstract:Inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and grid-world environments for meta-reinforcement learning research. Written in JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. Along with the environments, XLand-MiniGrid provides pre-sampled benchmarks with millions of unique tasks of varying difficulty and easy-to-use baselines that allow users to quickly start training adaptive agents. In addition, we have conducted a preliminary analysis of scaling and generalization, showing that our baselines are capable of reaching millions of steps per second during training and validating that the proposed benchmarks are challenging.
Machine Learning
What problem does this paper attempt to address?
This paper introduces XLand-MiniGrid, a JAX-based meta reinforcement learning (RL) environment toolkit aimed at addressing the issues of low sample efficiency and overfitting in RL. By employing meta reinforcement learning methods, agents can be pre-trained on various task distributions to improve their sample efficiency on new problems. However, current meta reinforcement learning methods require a large number of different tasks for pre-training, which may be infeasible for research labs and practitioners with limited resources. XLand-MiniGrid combines the complexity of XLand with the simplicity of MiniGrid to create a scalable rule and objective system that generates diverse task distributions. It is designed to be highly scalable and can run on GPU or TPU accelerators, making large-scale experiments easier. Furthermore, the paper provides pre-sampled benchmark tests consisting of millions of unique tasks, as well as user-friendly baseline algorithms for quick training of adaptive agents. The paper also performs initial scalability and generalization analysis, demonstrating that the proposed benchmark is challenging and that baseline algorithms can achieve speeds of millions of steps per second during training. However, there is still room for improvement in existing baselines, particularly in terms of generalization capabilities for new tasks. In summary, XLand-MiniGrid is an open-source meta reinforcement learning research library aimed at facilitating large-scale experiments and reducing resource constraints to drive research into the boundaries and scalability of reinforcement learning algorithms.