Abstract:We consider inexact policy iteration methods for large-scale infinite-horizon discounted MDPs with finite spaces, a variant of policy iteration where the policy evaluation step is implemented inexactly using an iterative solver for linear systems. In the classical dynamic programming literature, a similar principle is deployed in optimistic policy iteration, where an a-priori fixed-number of iterations of value iteration is used to inexactly solve the policy evaluation step. Inspired by the connection between policy iteration and semismooth Newton's method, we investigate a class of iPI methods that mimic the inexact variants of semismooth Newton's method by adopting a parametric stopping condition to regulate the level of inexactness of the policy evaluation step. For this class of methods we discuss local and global convergence properties and derive a practical range of values for the stopping-condition parameter that provide contraction guarantees. Our analysis is general and therefore encompasses a variety of iterative solvers for policy evaluation, including the standard value iteration as well as more sophisticated ones such as GMRES. As underlined by our analysis, the selection of the inner solver is of fundamental importance for the performance of the overall method. We therefore consider different iterative methods to solve the policy evaluation step and analyze their applicability and contraction properties when used for policy evaluation. We show that the contraction properties of these methods tend to be enhanced by the specific structure of policy evaluation and that there is margin for substantial improvement in terms of convergence rate. Finally, we study the numerical performance of different instances of inexact policy iteration on large-scale MDPs for the design of health policies to control the spread of infectious diseases in epidemiology.

A note on the policy iteration algorithm for discounted Markov decision processes for a class of semicontinuous models

Policy iteration for discrete-time systems with discounted costs: stability and near-optimality guarantees

A policy iteration algorithm for non-Markovian control problems

On policy iteration‐based discounted optimal control

Inexact Policy Iteration Methods for Large-Scale Markov Decision Processes

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

Approximate Policy Iteration Schemes: A Comparison

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Policy Iteration for Exploratory Hamilton--Jacobi--Bellman Equations

Markov Decision Processes with Incomplete Information and Semi-Uniform Feller Transition Probabilities

Performance Optimization of Semi-Markov Decision Processes with Discounted-cost Criteria.

On Convergence Analysis of Policy Iteration Algorithms for Entropy-Regularized Stochastic Control Problems

Model-Free $δ$-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H$\infty$ Tracking Control

On the Convergence of Modified Policy Iteration in Risk Sensitive Exponential Cost Markov Decision Processes

Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games with Application to H-infinity Control

Sufficiency of Markov Policies for Continuous-Time Jump Markov Decision Processes

Continuity of Discounted Values and the Structure of Optimal Policies for Periodic-Review Inventory Control with Setup Costs

Easy Monotonic Policy Iteration

From Optimization to Control: Quasi Policy Iteration

A New Continuous-Time Policy Iteration for Time-Varying Nonlinear Systems

A Novel Policy Iteration Algorithm for Nonlinear Continuous-Time H$\infty$ Control Problem