On Maximizing Probabilities for Over-Performing a Target for Markov Decision Processes

Tanhao Huang,Yanan Dai,Jinwen Chen
DOI: https://doi.org/10.1007/s11081-023-09870-4
IF: 2.619
2023-01-01
Optimization and Engineering
Abstract:This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.
What problem does this paper attempt to address?