Convergence of the Q-Ae Learning under Deterministic Mdps and Its Efficiency under the Stochastic Environment

G Zhao,S Tatsumi,RY Sun
DOI: https://doi.org/10.1109/icsmc.2000.884985
2000-01-01
IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences
Abstract:Reinforcement learning (RL) is an efficient method for solving Markov Decision Processes (MDPs) without any priori knowledge about an environment. Q-learning is a representative RL. Though it is guaranteed to derive the optimal policy, Q-learning needs numerous trials to learn the optimal policy. By the use of the feature of Q value, this paper presents an accelerated RL method, the Q-ae learning. Further, utilizing the dynamic programming principle, this paper proves the convergence to the optimal policy of the Q-ae learning under deterministic MDPs. The analytical and simulation results illustrate the efficiencies of the Q-ae learning under deterministic and stochastic MDPs.
What problem does this paper attempt to address?