A Study of Count-Based Exploration and Bonus for Reinforcement Learning

Zhi-Xiong Xu,Xi-Liang Chen,Lei Cao,Chen-Xi Li
DOI: https://doi.org/10.1109/icccbda.2017.7951951
2017-01-01
Abstract:In order to better balance exploration and exploitation and solve the problem of sparse reward in reinforcement learning, this paper focus on changing the traditional exploration strategy, such as ε-greedy strategy, by introducing the count for state-action visitation into the Boltzmann distribution method as a new behavioral exploration strategy, and adding count-based exploration bonus to guide the agent to exploration, proposes a method of count-based exploration and bonus for reinforcement learning. Selecting Sarsa(λ) learning algorithm as the basis, the effectiveness of the count-based Sarsa(λ) learning algorithm is verified by carrying out comparative experiments of the Q learning algorithm based on the ε-greedy strategy, the Sarsa(λ) learning algorithm based on the ε-greedy strategy, and the count-based Sarsa(λ) learning algorithm in the tank combat simulation problem. Due to its simplicity, it provides an easy yet powerful baseline for solving MDPs that require informed exploration.
What problem does this paper attempt to address?