Average Optimality for Pathwise Rewards

Xianping Guo,Onésimo Hernández-Lerma
DOI: https://doi.org/10.1007/978-3-642-02547-1_8
2009-01-01
Abstract: Chapter 8 studies the pathwise average reward (PAR) criterion for the MDP model in Chaps. 6 and 7. First, in Sect. 8.1, we present an example showing the difference between the EAR and the PAR criteria. In Sects. 8.2 and 8.3, we introduce some basic facts that allow us to prove, in Sect. 8.4, the existence of PAR optimal policies. In Sect. 8.5, we provide policy and value iteration algorithms for computing a PAR optimal policy. We conclude with an example in Sect. 8.6.
What problem does this paper attempt to address?