Abstract:Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetuning to overcome challenges with exploration. However, evaluating progress on offline RL algorithms requires effective and challenging benchmarks that capture properties of real-world tasks, provide a range of task difficulties, and cover a range of challenges both in terms of the parameters of the domain (e.g., length of the horizon, sparsity of rewards) and the parameters of the data (e.g., narrow demonstration data or broad exploratory data). While considerable progress in offline RL in recent years has been enabled by simpler benchmark tasks, the most widely used datasets are increasingly saturating in performance and may fail to reflect properties of realistic tasks. We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments, based on models of real-world robotic systems, and comprising a variety of data sources, including scripted data, play-style data collected by human teleoperators, and other data sources. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation, with some of the tasks specifically designed to require both pre-training and fine-tuning. We hope that our proposed benchmark will facilitate further progress on both offline RL and fine-tuning algorithms. Website with code, examples, tasks, and data is available at \url{<a class="link-external link-https" href="https://sites.google.com/view/d5rl/" rel="external noopener nofollow">this https URL</a>}

Toybox: A Suite of Environments for Experimental Evaluation of Deep Reinforcement Learning

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

MDP Playground: An Analysis and Debug Testbed for Reinforcement Learning

Benchmarking Safe Exploration in Deep Reinforcement Learning

A Benchmark Environment Motivated by Industrial Control Problems

rl_reach: Reproducible Reinforcement Learning Experiments for Robotic Reaching Tasks

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

OffWorld Gym: open-access physical robotics environment for real-world reinforcement learning benchmark and research

CaiRL: A High-Performance Reinforcement Learning Environment Toolkit

An Optical Control Environment for Benchmarking Reinforcement Learning Algorithms

Exploring Exploration: Comparing Children with RL Agents in Unified Environments

Discovering Minimal Reinforcement Learning Environments

Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

DoorGym: A Scalable Door Opening Environment And Baseline Agent

DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

DeepMind Lab2D

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Eden: A Unified Environment Framework for Booming Reinforcement Learning Algorithms