Improving the Exploration Efficiency of DQNs Via the Confidence Bound Methods

Wen Yingpeng,Su Qinliang,Shen Minghua,Xiao Nong
DOI: https://doi.org/10.1007/s10489-022-03363-0
IF: 5.3
2022-01-01
Applied Intelligence
Abstract:Deep Q-Learning Networks (DQNs) have already achieved great successes, but the 𝜖-greedy exploration strategy adopted in the original DQNs is very inefficient in exploring the state-action space, making the model easily get stuck at a poor optimal. To address this issue, inspired by the successes of upper confidence bound (UCB) methods primarily targeting the bandit problems, two more efficient exploration strategies are proposed by seeking to estimate the confidence bounds of the future long-term returns. The proposed methods are adapted to the high-dimensional DQNs by quantifying the state space and using a neural network to estimate the standard deviation. Experimental results show that our proposed methods perform better than the original DQN in terms of the total number of iterations and the final score. Notably, in 15 games we evaluate, our proposed methods get the highest score in 13 games and the score increases by a maximum of 8.27x compared to UCB-Exploration. When our algorithm is applied to the widely used Rainbow algorithm, substantial improvements were observed.
What problem does this paper attempt to address?