Distributional Reinforcement Learning for Efficient Exploration

Borislav Mavrin,Shangtong Zhang,Hengshuai Yao,Linglong Kong,Kaiwen Wu,Yaoliang Yu
DOI: https://doi.org/10.48550/arXiv.1905.06125
2019-05-14
Abstract:In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties. We propose a novel and efficient exploration method for deep RL that has two components. The first is a decaying schedule to suppress the intrinsic uncertainty. The second is an exploration bonus calculated from the upper quantiles of the learned distribution. In Atari 2600 games, our method outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain across 49 games in cumulative rewards over QR-DQN with a big win in Venture). We also compared our algorithm with QR-DQN in a challenging 3D driving simulator (CARLA). Results show that our algorithm achieves near-optimal safety rewards twice faster than QRDQN.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?