A Q-learning algorithm for Markov decision processes with continuous state spaces

Jiaqiao Hu,Xiangyu Yang,Jian-Qiang Hu,Yijie Peng
DOI: https://doi.org/10.1016/j.sysconle.2024.105782
IF: 2.742
2024-05-01
Systems & Control Letters
Abstract:We propose an online algorithm for solving a class of continuous-state Markov decision processes. The algorithm combines classical Q-learning with an asynchronous averaging procedure, which allows Q-function estimates at sampled state–action pairs to be adaptively updated based on observations collected along a single sample trajectory. These estimates are then used to iteratively construct an interpolation-based function approximator of the Q-function. We prove the convergence of the algorithm and provide numerical results to illustrate its performance.
automation & control systems,operations research & management science
What problem does this paper attempt to address?