Exploring Variational Deep Q Networks

A. H. Bell-Thomas
DOI: https://doi.org/10.48550/arXiv.2008.01641
2020-08-04
Abstract:This study provides both analysis and a refined, research-ready implementation of Tang and Kucukelbir's Variational Deep Q Network, a novel approach to maximising the efficiency of exploration in complex learning environments using Variational Bayesian Inference. Alongside reference implementations of both Traditional and Double Deep Q Networks, a small novel contribution is presented - the Double Variational Deep Q Network, which incorporates improvements to increase the stability and robustness of inference-based learning. Finally, an evaluation and discussion of the effectiveness of these approaches is discussed in the wider context of Bayesian Deep Learning.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper mainly explores the problem of how to maximize exploration efficiency in a complex environment. Specifically, it improves the Deep Q Networks (DQN) by introducing Variational Bayesian Inference, so as to more effectively balance the exploration of the state space and the utilization of known effective actions. #### Background and problem description In Reinforcement Learning (RL), an agent needs to learn the optimal policy in an unknown environment to maximize the cumulative reward. However, in a complex environment, the agent faces a fundamental problem: how to strike a balance between exploring new states (exploration) and exploiting known effective actions (exploitation). Traditional Q - learning methods (such as DQN) sometimes show large over - estimation errors in some random environments, resulting in poor performance. To solve this problem, this paper introduces the **Variational Deep Q Network (VDQN)**, which is a new method based on Variational Bayesian Inference. VDQN constructs prior and posterior distributions to represent the uncertainty of network parameters, thereby encouraging the agent to conduct more effective exploration. #### Main contributions 1. **Implementation of VDQN**: - The paper provides a research - level implementation of VDQN proposed by Tang and Kucukelbir. - It implements Variational Bayesian Inference using the Edward Probabilistic Programming Library (PPL). 2. **Proposal of Double VDQN**: - A new improvement, Double VDQN (DVDQN), is proposed, which combines the stabilization techniques proposed by Mnih et al. to improve the stability and robustness of the inference method. 3. **Evaluation and discussion**: - A detailed evaluation of VDQN and DVDQN is carried out and compared with traditional DQN and Double DQN. - The effectiveness of these methods in Bayesian deep learning is explored. #### Summary of mathematical formulas - **Q - function**: \[ Q(s, a): S\times A\rightarrow\mathbb{R} \] - **Optimal policy**: \[ \pi^{*}=\arg\max_{\pi}\mathbb{E}_{\sigma\sim\pi(\cdot|s_{0})}\left[\sum_{t\in\sigma}Q^{\pi}(s_{t},a_{t})\cdot\gamma^{t}\right] \] - **Q - learning update rule**: \[ Q_{\text{next}}(s_{t},a_{t})\leftarrow(1 - \alpha)\cdot Q(s_{t},a_{t})+\alpha\cdot(r_{t}+\gamma\cdot\max_{a}Q(s_{t + 1},a)) \] - **Objective function of VDQN**: \[ \mathbb{E}_{\theta\sim q_{\phi}(\theta)}\left[\left(Q_{\theta}(s_{j},a_{j})-\max_{a'}\mathbb{E}[r_{j}+\gamma\cdot Q_{\theta}(s'_{j},a')]\right)^{2}\right]-\lambda H(q_{\phi}(\theta)) \] #### Conclusion By introducing VDQN and DVDQN, this paper aims to solve the problem of low exploration efficiency of traditional DQN in complex environments. The experimental results show that VDQN and DVDQN exhibit faster learning speed and higher stability in some simple tasks, especially better convergence when dealing with VI losses. Future research can further optimize these.