Optimization Of Parametric Policies Of Markov Decision Processes Under A Variance Criterion

Li Xia
DOI: https://doi.org/10.1109/WODES.2016.7497868
2016-01-01
Abstract:The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.
What problem does this paper attempt to address?