Discount Optimality for Unbounded Rewards

Xianping Guo,Onésimo Hernández-Lerma
DOI: https://doi.org/10.1007/978-3-642-02547-1_6
2009-01-01
Abstract: Chapter 6 concerns the expected α-discounted reward criterion for continuous-time MDPs with unbounded transition rates and reward functions that may have neither upper nor lower bounds. We begin in Sect. 6.1 with introducing the “uniformization technique,” widely used in continuous-time MDPs, and then, in Sect. 6.2, we establish the discounted reward optimality equation by using a value iteration technique. The existence of discounted reward optimal policies and a value iteration algorithm are given in Sects. 6.3 and 6.4, respectively. Furthermore, in Sect. 6.5, several examples are provided to illustrate the results of this chapter.
What problem does this paper attempt to address?