Abstract:This paper studies the minimizing risk problems in Markov decision processes with countable state space and reward set. The objective is to find a policy which minimizes the probability (risk) that the total discounted rewards do not exceed a specified value (target). In this sort of model, the decision made by the decision maker depends not only on system's states, but also on his target values. By introducing the decision-maker's state, we formulate a framework for minimizing risk models. The policies discussed depend on target values and the rewards may be arbitrary real numbers. For the finite horizon model, the main results obtained are: (i) The optimal value functions are distribution functions of the target, (ii) there exists an optimal deterministic Markov policy, and (iii) a policy is optimal if and only if at each realizable state it always takes optimal action. In addition, we obtain a sufficient condition and a necessary condition for the existence of finite horizon optimal policy independent of targets and we give an algorithm computing finite horizon optimal policies and optimal value functions. For an infinite horizon model, we establish the optimality equation and we obtain the structure property of optimal policy. We prove that the optimal value function is a distribution function of target and we present a new approximation formula which is the generalization of the nonnegative rewards cases. An example which illustrates the mistakes of previous literature shows that the existence of optimal policy has not been proved really. In this paper, we give an existence condition, which is a sufficient and necessary condition for the existence of an infinite horizon optimal policy independent of targets, and we point out that whether there exists an optimal policy remains an open problem in the general case.

Semi-Markov Decision Processes with Variance Minimization Criterion

Performance Optimization of Semi-Markov Decision Processes with Discounted-cost Criteria.

Continuous Time Markov Decision Processes with Expected Discounted Total Rewards

Mixed Markov Decision Processes in a Semi-Markov Environment with Discounted Criterion

Risk-sensitive discounted Markov decision processes with unbounded reward functions and Borel spaces

A Sensitivity‐Based Construction Approach to Variance Minimization of Markov Decision Processes

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

Minimizing Risk Models in Markov Decision Processes with Policies Depending on Target Values

The Finiteness of the Reward Function and the Optimal Value Function in Markov Decision Processes

Performance Optimization for Countable Semi-Markov Decision Processes with Discounted-cost

A Discount Vanishing Approximation for Markov Decision Processes with Risk Sensitivity

Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process

Relations Between Discounted Models and Average Models for Semi-Markov Decision Processes

Discounted cost exponential semi-Markov decision processes with unbounded transition rates: a service rate control problem with impatient customers

Optimal Stationary Policies for Semi-Markov Control Processes with Discounted-Cost Criteria

Risk‐Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

Policy Gradients with Variance Related Risk Criteria

Markov Decision Processes under Risk Sensitivity: A Discount Vanishing Approach

Error bounds of optimization algorithms for semi-Markov decision processes

Continuous Time Markov Decision Processes with Discounted Moment Criterion

Global Algorithms for Mean-Variance Optimization in Markov Decision Processes