Risk-Sensitive Optimal Control in Communicating Average Markov Decision Chains

Rolando Cavazos-Cadena,Emmanuel Fernández-Gaucherand
DOI: https://doi.org/10.1007/0-306-48102-2_22
2002-01-01
Abstract:This work concerns discrete-time Markov decision processes with denumerable state space and bounded costs per stage. The performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion associated to a utility function with constant risk sensitivity coefficient λ, and the main objective of the paper is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation for arbitrary values of λ. The main results are as follows: When the state space is finite, if the transition law is communicating, in the sense that under an arbitrary stationary policy transitions are possible between every pair of states, the optimality equation has a bounded solution for arbitrary non-null λ. However, when the state space is infinite and denumerable, the communication requirement and a strong form of the simultaneous Doeblin condition do not yield a bounded solution to the optimality equation if the risk sensitivity coefficie nt has a sufficiently large absolute value, in general.
What problem does this paper attempt to address?