ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning

Jannis Becktepe,Julian Dierkes,Carolin Benjamins,Aditya Mohan,David Salinas,Raghu Rajan,Frank Hutter,Holger Hoos,Marius Lindauer,Theresa Eimer
2024-09-27
Abstract:Hyperparameters are a critical factor in reliably training well-performing reinforcement learning (RL) agents. Unfortunately, developing and evaluating automated approaches for tuning such hyperparameters is both costly and time-consuming. As a result, such approaches are often only evaluated on a single domain or algorithm, making comparisons difficult and limiting insights into their generalizability. We propose ARLBench, a benchmark for hyperparameter optimization (HPO) in RL that allows comparisons of diverse HPO approaches while being highly efficient in evaluation. To enable research into HPO in RL, even in settings with low compute resources, we select a representative subset of HPO tasks spanning a variety of algorithm and environment combinations. This selection allows for generating a performance profile of an automated RL (AutoRL) method using only a fraction of the compute previously necessary, enabling a broader range of researchers to work on HPO in RL. With the extensive and large-scale dataset on hyperparameter landscapes that our selection is based on, ARLBench is an efficient, flexible, and future-oriented foundation for research on AutoRL. Both the benchmark and the dataset are available at <a class="link-external link-https" href="https://github.com/automl/arlbench" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges faced by automated hyperparameter optimization (HPO) in reinforcement learning (RL): 1. **High evaluation cost**: Developing and evaluating automated hyperparameter tuning methods is both expensive and time - consuming. Existing methods are usually evaluated only in one domain or algorithm, making it difficult to compare the effects of different HPO methods and limiting the understanding of their generalization ability. 2. **Lack of standardized benchmarks**: Most current research focuses on specific environments or algorithms, which makes it difficult to directly compare the results between different studies and also hinders the comprehensive evaluation of the performance of AutoRL methods. 3. **High demand for computing resources**: Existing HPO benchmarks require a large amount of computing resources, which makes many researchers unable to participate in relevant research, especially those without sufficient computing resources. To solve these problems, the paper proposes ARLBench, an efficient and flexible hyperparameter optimization benchmark platform specifically for RL. The main objectives of ARLBench include: - **Providing an efficient HPO benchmark**: By selecting a representative subset of environments, the demand for computing resources is reduced, enabling more researchers to participate in HPO research. - **Supporting diverse HPO methods**: ARLBench not only supports static HPO methods, but also supports methods for dynamic configuration and adaptive adjustment of hyperparameters to adapt to the dynamic characteristics of RL algorithms. - **Generating large - scale performance data**: Through large - scale experiments, data from more than 100,000 runs are collected, covering multiple RL algorithms, environments, seeds, and configurations, providing rich data support for future research. ### Specific contributions of ARLBench 1. **Efficient HPO benchmark**: ARLBench can evaluate HPO methods with a significant reduction in computing resources. For example, when evaluating 32 complete RL training sessions, only 937 GPU hours are required, while 8,163 GPU hours are required when using StableBaselines3. 2. **Standardized selection of environment subsets**: By selecting a representative subset of environments, ARLBench can provide reliable performance estimates in different RL task spaces, thereby improving the comparability and generalization ability of evaluation. 3. **Large - scale performance data set**: It provides performance data from more than 100,000 runs, covering multiple RL algorithms, environments, seeds, and configurations. These data can help researchers better understand the performance of HPO in RL. 4. **Flexibility and extensibility**: ARLBench supports large configuration spaces and allows for dynamic adjustment of hyperparameters during the training process, which provides greater flexibility and possibilities for future AutoRL research. In summary, ARLBench solves the key problems in hyperparameter optimization research in RL by providing an efficient, flexible, and standardized benchmark platform, promoting further development in this field.