Markov decision Processes with fractional costs

Zhiyuan Ren,B. H. Krogh
DOI: https://doi.org/10.1109/TAC.2005.846520
2005-01-01
Abstract:Certain methods for constructing embedded Markov decision processes (MDPs) lead to performance measures that are the ratio of two long-run averages. For such MDPs with finite state and action spaces and under an ergodicity assumption, this note presents algorithms for computing optimal policies based on policy iterations, linear programming, value iterations and Q-learning.
What problem does this paper attempt to address?