Online meta-learning by parallel algorithm competition

Stefan Elfwing,Eiji Uchibe,Kenji Doya
DOI: https://doi.org/10.1145/3205455.3205486
2018-07-02
Abstract:The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulate the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question, which arguably has become a more important issue recently with the success of deep reinforcement learning. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. In this study, we propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method, which is a novel Lamarckian evolutionary approach to online meta-parameter adaptation. The population consists of several instances of a reinforcement learning algorithm which are run in parallel with small differences in initial meta-parameter values. After a fixed number of learning episodes, the instances are selected based on their performance on the task at hand, i.e., the fitness. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-the-art results in stochastic SZ-Tetris and in 10x10 Tetris by 31% and 84%, respectively, and by improving the learning speed and performance for deep Sarsa() agents in the Atari 2600 domain.
What problem does this paper attempt to address?