Convergence of Markov Decision Processes with Constraints and State-Action Dependent Discount Factors

Xiao Wu,Xianping Guo
DOI: https://doi.org/10.1007/s11425-017-9292-1
2020-01-01
Abstract:This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes (DTMDPs) with constraints, state-action dependent discount factors, and possibly unbounded costs. Using the convex analytic approach under mild conditions, we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the limit" one. Furthermore, we show that any countable- state DTMDP can be approximated by a sequence of finite-state DTMDPs, which are constructed using the truncation technique. Finally, we illustrate the approximation by solving a controlled queueing system numeri- cally, and give the corresponding error bound of the approximation.
What problem does this paper attempt to address?