XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Alexander Nikulin,Ilya Zisman,Alexey Zemtsov,Viacheslav Sinii,Vladislav Kurenkov,Sergey Kolesnikov
2024-06-13
Abstract:Following the success of the in-context learning paradigm in large-scale language and computer vision models, the recently emerging field of in-context reinforcement learning is experiencing a rapid growth. However, its development has been held back by the lack of challenging benchmarks, as all the experiments have been carried out in simple environments and on small-scale datasets. We present \textbf{XLand-100B}, a large-scale dataset for in-context reinforcement learning based on the XLand-MiniGrid environment, as a first step to alleviate this problem. It contains complete learning histories for nearly $30,000$ different tasks, covering $100$B transitions and $2.5$B episodes. It took $50,000$ GPU hours to collect the dataset, which is beyond the reach of most academic labs. Along with the dataset, we provide the utilities to reproduce or expand it even further. With this substantial effort, we aim to democratize research in the rapidly growing field of in-context reinforcement learning and provide a solid foundation for further scaling. The code is open-source and available under Apache 2.0 licence at <a class="link-external link-https" href="https://github.com/dunno-lab/xland-minigrid-datasets" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper presents a large-scale multi-task dataset named XLand-100B for In-Context Reinforcement Learning. Currently, despite progress in language and computer vision domains, the lack of challenging benchmark datasets has hindered the development of In-Context Reinforcement Learning. XLand-100B consists of learning histories from nearly 30,000 different tasks, covering 100 billion transitions and 2.5 billion episodes. The collection of this dataset required 50,000 GPU hours, surpassing the capacity of most academic labs. The paper points out that existing reinforcement learning datasets have limited task quantities and are not suitable for training models capable of context learning. XLand-100B aims to promote research in this field by providing a large number of diverse and complex tasks, and lay the foundation for future larger-scale expansions. The dataset is compatible with various In-Context Reinforcement Learning methods and provides tools for replicating or extending the dataset. The paper also introduces two In-Context Reinforcement Learning methods: Algorithm Distillation (AD) and Decision Pretraining Transformer (DPT), and discusses the data collection process, data format, data quality, and applicability. Through experiments, they found that the current methods still have room for improvement in complex tasks, indicating that there is still a significant amount of research needs in this field. Overall, this paper addresses the lack of large-scale and diverse datasets in the field of In-Context Reinforcement Learning, aiming to drive research in this field and promote the generalization ability of models.