Abstract:This paper studies the minimizing risk problems in Markov decision processes with countable state space and reward set. The objective is to find a policy which minimizes the probability (risk) that the total discounted rewards do not exceed a specified value (target). In this sort of model, the decision made by the decision maker depends not only on system's states, but also on his target values. By introducing the decision-maker's state, we formulate a framework for minimizing risk models. The policies discussed depend on target values and the rewards may be arbitrary real numbers. For the finite horizon model, the main results obtained are: (i) The optimal value functions are distribution functions of the target, (ii) there exists an optimal deterministic Markov policy, and (iii) a policy is optimal if and only if at each realizable state it always takes optimal action. In addition, we obtain a sufficient condition and a necessary condition for the existence of finite horizon optimal policy independent of targets and we give an algorithm computing finite horizon optimal policies and optimal value functions. For an infinite horizon model, we establish the optimality equation and we obtain the structure property of optimal policy. We prove that the optimal value function is a distribution function of target and we present a new approximation formula which is the generalization of the nonnegative rewards cases. An example which illustrates the mistakes of previous literature shows that the existence of optimal policy has not been proved really. In this paper, we give an existence condition, which is a sufficient and necessary condition for the existence of an infinite horizon optimal policy independent of targets, and we point out that whether there exists an optimal policy remains an open problem in the general case.

Maximizing the probability of visiting a set infinitely often for a Markov decision process with Borel state and action spaces

Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space

Extreme Occupation Measures in Markov Decision Processes with an Absorbing State

On Strategic Measures and Optimality Properties in Discrete-Time Stochastic Control with Universally Measurable Policies

Risk-sensitive discounted Markov decision processes with unbounded reward functions and Borel spaces

Mean Field Markov Decision Processes

Absorbing Markov Decision Processes

Asymptotically Optimal Policies for Weakly Coupled Markov Decision Processes

A note on weak compactness of occupation measures for an absorbing Markov decision process

The minimal hitting probability of continuous-time controlled Markov systems with countable states

On Borkar and Young Relaxed Control Topologies and Continuous Dependence of Invariant Measures on Control Policy

Constrained Markov Decision Processes with Non-constant Discount Factor

Optimal models with maximizing probability of first achieving target value in the preceding stages

Continuous Time Markov Decision Processes with Nonuniformly Bounded Transition Rate: Expected Total Rewards

Minimizing Risk Models in Markov Decision Processes with Policies Depending on Target Values

Risk-Sensitive Average Markov Decision Processes in General Spaces

Probabilistic Control and Majorization of Optimal Control

Extreme occupation measures in Markov decision processes with a cemetery

Maximal reliability of controlled Markov systems

Variational optimization of probability measure spaces resolves the chain store paradox

On Maximizing Probabilities for Over-Performing a Target for Markov Decision Processes