Abstract:In this paper, we study a variance minimization problem in an infinite stage discrete time Markov decision process (MDP), regardless of the mean performance. For the Markov chain under the variance criterion, since the value of the cost function at the current stage will be affected by future actions, this problem is not a standard MDP and the traditional MDP theory is not applicable. In this paper, we convert the variance minimization problem into a standard MDP by introducing a concept called pseudo variance. Then we derive a variance difference formula that quantifies the difference of variances of Markov systems under any two policies. With the difference formula, the correlation of the variance cost function at different stages can be decoupled through a nonnegative term. A necessary condition of the optimal policy is obtained. It is also proved that the optimal policy with the minimal variance can be found in the deterministic policy space. Furthermore, we propose an efficient iterative algorithm to reduce the variance of Markov systems. We prove that this algorithm can converge to a local optimum. Finally, a numerical experiment is conducted to demonstrate the efficiency of our algorithm compared with the gradient-based method widely adopted in the literature.

Markov Decision Processes with Variance Minimization: A New Condition and Approach

Semi-Markov Decision Processes with Variance Minimization Criterion

Optimization of Markov Decision Processes under the Variance Criterion

Average Optimality for Markov Decision Processes in Borel Spaces: a New Condition and Approach

Nonstationary Denumerable State Markov Decision Processes – with Average Variance Criterion

Variance Minimization of Parameterized Markov Decision Processes

New Average Optimality Conditions for Semi-Markov Decision Processes in Borel Spaces.

Another Set of Conditions for Markov Decision Processes with Average Sample-Path Costs

A Sensitivity‐Based Construction Approach to Variance Minimization of Markov Decision Processes

Another Set of Verifiable Conditions for Average Markov Decision Processes with Borel Spaces

A Semimartingale Characterization of Average Optimal Stationary Policies for Markov Decision Processes

Optimization Of Parametric Policies Of Markov Decision Processes Under A Variance Criterion

Unbounded Cost Markov Decision Processes with Limsup and Liminf Average Criteria: New Conditions

The Average Variance Criterion for Nonstationary MDP with Borel State Space

First Passage Markov Decision Processes with Constraints and Varying Discount Factors

Mean-Variance Criteria for Finite Continuous-Time Markov Decision Processes

Mean-variance optimization of discrete time discounted Markov decision processes.

A Mean–variance Optimization Problem for Discounted Markov Decision Processes

New Discount and Average Optimality Conditions for Continuous-Time Markov Decision Processes

A Note on Optimality Conditions for Continuous-Time Markov Decision Processes with Average Cost Criterion

Mean-variance optimality for semi-Markov decision processes under first passage criteria.