Two-stage Population Based Training Method for Deep Reinforcement Learning
Yinda Zhou,Weiming Liu,Bin Li
DOI: https://doi.org/10.1145/3318265.3318294
2019-01-01
Abstract:Deep reinforcement learning (DRL) methods has been widely applied on more and more challenging learning tasks, and achieved excellent performance. However, the efficiency of deep reinforcement learning is notoriously sensitive to their own hyperparameter configuration. The optimization process of deep reinforcement learning is highly dynamic and non-stationary, rather than a simple fitting process. So, its optimal hyperparameter should be adaptively adjusted according to the current learning process, rather than using a fixed set of hyperparameter configurations from beginning to end. DeepMind innovatively proposed a population based training (PBT) method for deep reinforcement learning, which achieved hyperparameter adaptation and made the model better trained. However, we assume that at the early stage when the learning model has little knowledge of the environment, frequent hyperparameter change will not be helpful for the model to learn efficiently, while learning with a reasonable fixed hyperparameter configuration will help the model obtain necessary knowledge as quick as possible, which we consider is more important for reinforcement learning at early stage. In this paper, we verified our hypothesis through experiments, and a Two-Stage Population Based Training (TS-PBT) method is proposed, which is a more efficient population based training method for deep reinforcement learning. Experiments show that at the same computational budget, our TS-PBT method makes the final performance of the model significantly better than the PBT method. TS-PBT achieved 40%, 310%, 2%, 53%, 30% and 38% performance improvement over PBT separately in six test environments.