Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning

Lars Hertel,Pierre Baldi,Daniel L. Gillen
DOI: https://doi.org/10.48550/arXiv.2007.14604
2020-07-30
Abstract:Reinforcement learning algorithms can show strong variation in performance between training runs with different random seeds. In this paper we explore how this affects hyperparameter optimization when the goal is to find hyperparameter settings that perform well across random seeds. In particular, we benchmark whether it is better to explore a large quantity of hyperparameter settings via pruning of bad performers, or if it is better to aim for quality of collected results by using repetitions. For this we consider the Successive Halving, Random Search, and Bayesian Optimization algorithms, the latter two with and without repetitions. We apply these to tuning the PPO2 algorithm on the Cartpole balancing task and the Inverted Pendulum Swing-up task. We demonstrate that pruning may negatively affect the optimization and that repeated sampling does not help in finding hyperparameter settings that perform better across random seeds. From our experiments we conclude that Bayesian optimization with a noise robust acquisition function is the best choice for hyperparameter optimization in reinforcement learning tasks.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper primarily explores how to optimize hyperparameter settings in deep reinforcement learning to achieve models that perform well under different random seeds. Specifically, the paper investigates how to choose appropriate hyperparameter optimization strategies in the presence of significant performance fluctuations. The authors compare several common hyperparameter optimization methods, including Random Search, Successive Halving Algorithm (SHA), Bayesian Optimization, etc., and discuss whether Repeated Evaluation helps in finding better hyperparameter configurations. The study finds that in reinforcement learning tasks, Bayesian Optimization, especially methods using noise-robust acquisition functions (such as qNEI), performs the best. Additionally, the paper points out that in this context, exploring more configurations through pruning methods (such as SHA) does not outperform Bayesian Optimization. At the same time, the method of performing multiple repeated evaluations for each hyperparameter setting does not show better results than single evaluations. Therefore, the paper recommends using Bayesian Optimization as the preferred method for hyperparameter optimization when facing uncertainty and randomness.