Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning

Lars Hertel,Pierre Baldi,Daniel L. Gillen

DOI: https://doi.org/10.48550/arXiv.2007.14604

2020-07-30

Abstract:Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter settings that perform well across random seeds. In particular, we benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions. For this we consider the Successive Halving, Random Search, and Bayesian Optimization algorithms, the latter two with and without repetitions. We apply these to tuning the PPO2 algorithm on the Cartpole balancing task and the Inverted Pendulum Swing-up task. We demonstrate that pruning may negatively affect the optimization and that repeated sampling does not help in finding hyperparameter settings that perform better across random seeds. From our experiments we conclude that Bayesian optimization with a noise robust acquisition function is the best choice for hyperparameter optimization in reinforcement learning tasks.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper primarily explores how to optimize hyperparameter settings in deep reinforcement learning to achieve models that perform well under different random seeds. Specifically, the paper investigates how to choose appropriate hyperparameter optimization strategies in the presence of significant performance fluctuations. The authors compare several common hyperparameter optimization methods, including Random Search, Successive Halving Algorithm (SHA), Bayesian Optimization, etc., and discuss whether Repeated Evaluation helps in finding better hyperparameter configurations. The study finds that in reinforcement learning tasks, Bayesian Optimization, especially methods using noise-robust acquisition functions (such as qNEI), performs the best. Additionally, the paper points out that in this context, exploring more configurations through pruning methods (such as SHA) does not outperform Bayesian Optimization. At the same time, the method of performing multiple repeated evaluations for each hyperparameter setting does not show better results than single evaluations. Therefore, the paper recommends using Bayesian Optimization as the preferred method for hyperparameter optimization when facing uncertainty and randomness.

Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning

Hyperparameters in Reinforcement Learning and How To Tune Them

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Direct Random Search for Fine Tuning of Deep Reinforcement Learning Policies

Reinforcement Learning Driven Heuristic Optimization

Practical Bayesian Optimization of Machine Learning Algorithms

Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020

Towards Autonomous Reinforcement Learning: Automatic Setting of Hyper-parameters using Bayesian Optimization

Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning

Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods

Decision Confidence and Outcome Variability Optimally Regulate Separate Aspects of Hyperparameter Setting

Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization

Combining Automated Optimisation of Hyperparameters and Reward Shape

Hyperparameter Optimization for Multi-Objective Reinforcement Learning

ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning

Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control

Hyperparameter Auto-Tuning in Self-Supervised Robotic Learning

Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training

On Hyper-parameter Tuning for Stochastic Optimization Algorithms

Improving the Diversity of Bootstrapped DQN by Replacing Priors With Noise