Abstract:Inspired by Double Q-learning algorithm, the Double-DQN (DDQN) algorithm was originally proposed in order to address the overestimation issue in the original DQN algorithm. The DDQN has successfully shown both theoretically and empirically the importance of decoupling in terms of action evaluation and selection in computation of target values; although, all the benefits were acquired with only a simple adaption to DQN algorithm, minimal possible change as it was mentioned by the authors. Nevertheless, there seems a roll-back in the proposed algorithm of DDQN since the parameters of policy network are emerged again in the target value function which were initially withdrawn by DQN with the hope of tackling the serious issue of moving targets and the instability caused by it (i.e., by moving targets) in the process of learning. Therefore, in this paper three modifications to the DDQN algorithm are proposed with the hope of maintaining the performance in the terms of both stability and overestimation. These modifications are focused on the logic of decoupling the best action selection and evaluation in the target value function and the logic of tackling the moving targets issue. Each of these modifications have their own pros and cons compared to the others. The mentioned pros and cons mainly refer to the execution time required for the corresponding algorithm and the stability provided by the corresponding algorithm. Also, in terms of overestimation, none of the modifications seem to underperform compared to the original DDQN if not outperform it. With the intention of evaluating the efficacy of the proposed modifications, multiple empirical experiments along with theoretical experiments were conducted. The results obtained are represented and discussed in this article.

Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network

Policy Return: A New Method for Reducing the Number of Experimental Trials in Deep Reinforcement Learning

Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Generalization and Regularization in DQN

Understanding the Synergies between Quality-Diversity and Deep Reinforcement Learning

Deep Reinforcement Learning with Double Q-Learning

R-DDQN: Optimizing Algorithmic Trading Strategies Using a Reward Network in a Double DQN

Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning

A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C

Convergent and Efficient Deep Q Network Algorithm

Does DQN Learn?

Q-ADER: An Effective Q-Learning for Recommendation With Diminishing Action Space

Modified Double DQN: addressing stability

Soft Q Network

Deep Q-learning Sampling Based on Advantages

State of the Art Control of Atari Games Using Shallow Reinforcement Learning

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

A State Representation Dueling Network for Deep Reinforcement Learning

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning