Average cost Markov decision processes with countable state spaces
ZHANG Jun-yu,WU Yi-ting,XIA Li,CAO Xi-ren
DOI: https://doi.org/10.7641/CTA.2021.10763
2021-01-01
Abstract:For the long-run average of a Markov decision process (MDP) with countable state spaces, the optimal (sta-tionary) policy may not exist. In this paper, we study the optimal policies satisfying optimality inequality in a countable-state MDP under the long-run average criterion. Different from the vanishing discount approach, we use the discrete Dynkin's formula to derive the main results of this paper. We first provide the Poisson equation of an ergodic Markov chain and two instructive examples about null recurrent Markov chains, and demonstrate the existence of optimal policies for two optimal-ity inequalities with opposite directions. Then, from two comparison lemmas and the performance difference formula, we prove the existence of optimal policies under positive recurrent chains and multi-chains, which is further extended to other situations. Especially, several examples of applications are provided to illustrate the essential of performance sensitivity of the long-run average. Our results make a supplement to the literature work on the optimality inequality of average MDPs with countable states.