Abstract:We consider inexact policy iteration methods for large-scale infinite-horizon discounted MDPs with finite spaces, a variant of policy iteration where the policy evaluation step is implemented inexactly using an iterative solver for linear systems. In the classical dynamic programming literature, a similar principle is deployed in optimistic policy iteration, where an a-priori fixed-number of iterations of value iteration is used to inexactly solve the policy evaluation step. Inspired by the connection between policy iteration and semismooth Newton's method, we investigate a class of iPI methods that mimic the inexact variants of semismooth Newton's method by adopting a parametric stopping condition to regulate the level of inexactness of the policy evaluation step. For this class of methods we discuss local and global convergence properties and derive a practical range of values for the stopping-condition parameter that provide contraction guarantees. Our analysis is general and therefore encompasses a variety of iterative solvers for policy evaluation, including the standard value iteration as well as more sophisticated ones such as GMRES. As underlined by our analysis, the selection of the inner solver is of fundamental importance for the performance of the overall method. We therefore consider different iterative methods to solve the policy evaluation step and analyze their applicability and contraction properties when used for policy evaluation. We show that the contraction properties of these methods tend to be enhanced by the specific structure of policy evaluation and that there is margin for substantial improvement in terms of convergence rate. Finally, we study the numerical performance of different instances of inexact policy iteration on large-scale MDPs for the design of health policies to control the spread of infectious diseases in epidemiology.

Approximate Policy Iteration Schemes: A Comparison

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms

Inexact Policy Iteration Methods for Large-Scale Markov Decision Processes

Approximate Modified Policy Iteration

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

Approximate Policy Iteration for Robust Stochastic Control of Multi-agent Markov Decision Processes

Discrete-time Generalized Policy Iteration ADP Algorithm with Approximation Errors.

Approximate Finite-Horizon Optimal Control with Policy Iteration

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage

Temporal Difference-Based Policy Iteration for Optimal Control of Stochastic Systems

Hierarchical Approximate Policy Iteration with Binary-Tree State Space Decomposition.

Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems.

Policy iteration for discrete-time systems with discounted costs: stability and near-optimality guarantees

From Optimization to Control: Quasi Policy Iteration

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Policy Iteration Algorithm for Singular Controlled Diffusion Processes

Approximate Policy Iteration with Bisimulation Metrics

On policy iteration‐based discounted optimal control

Discrete-Time Optimal Control Via Local Policy Iteration Adaptive Dynamic Programming

Policy Iteration Approach to the Infinite Horizon Average Optimal Control of Probabilistic Boolean Networks