Optimal models with maximizing probability of first achieving target value in the preceding stages

Yuanlie Lin,Congbin Wu,Boda Kang
DOI: https://doi.org/10.1360/03ys9042
2003-01-01
Abstract:Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of ε-optimal policy for finite state space. We give an example for the reliability of the satellite systems using the above theory. Finally, we extend these results to more general cases.
What problem does this paper attempt to address?