A Scalable Derivative-free Exploration Approach for Reinforcement Learning

Xiong-Hui Chen
DOI: https://doi.org/10.21203/rs.3.rs-1973394/v1
2022-01-01
Abstract:Abstract Exploration in complex environments remains a challenge for deep reinforcement learning (RL). Derivative-free optimization (DFO), which provides efficient black-box solution sampling mechanisms, is a potential way to address this issue. Recent studies which inject these mechanisms into RL have shown better exploration abilities than derivative-based policy optimization methods. However, we found that these methods suffer from low sample efficiency in high-dimensional policy parameter space, which limits the adaptability of these methods in complex tasks requiring large neural networks to represent well-performed policies. In this paper, we propose a scalable exploration algorithm based on derivative-free optimization, named Scalable Derivative-free Exploration (SDFE), which improves exploration in complex RL problems. SDFE handles the high-dimensional policy searching problem by optimizing policy in a transformed small parameter space of policy, rather than the original parameter space of the policy neural network. SDFE is a general framework that is compatible with derivative-free optimization methods and off-policy policy-based algorithms. In experiments, we instantiate SDFE with two derivative-free optimization algorithms (SRACOS and CMAES) and three off-policy actor-critic algorithms (SAC, ACER, and DDPG), to show its efficiency and adaptability. We conduct experiments on MuJoCo tasks and 42 pixel-based games in Atari, empirically verifying that SDFE can reach better performance with a competitive number of samples on both tasks.
What problem does this paper attempt to address?