Constrained Markov Decision Processes with First Passage Criteria

Yonghui Huang,Qingda Wei,Xianping Guo
DOI: https://doi.org/10.1007/s10479-012-1292-1
2012-01-01
Abstract:This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state-action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.
What problem does this paper attempt to address?