Abstract:In this paper, we propose a new policy iteration algorithm to compute the value function and the optimal controls of continuous time stochastic control problems. The algorithm relies on successive approximations using linear-quadratic control problems which can all be solved explicitly, and only require to solve recursively linear PDEs in the Markovian case. Though our procedure fails in general to produce a non-decreasing sequence like the standard algorithm, it can be made arbitrarily close to being monotone. More importantly, we recover the standard exponential speed of convergence for both the value and the controls, through purely probabilistic arguments which are significantly simpler than in the classical case. Our proof also accommodates non-Markovian dynamics as well as volatility control, allowing us to obtain the first convergence results in the latter case for a state process in multi-dimensions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to calculate the value function and the optimal control strategy in non - Markovian control problems. Specifically, the author proposes a new policy iteration algorithm for solving the value function and the optimal control in continuous - time stochastic control problems. This algorithm uses a linear - quadratic control problem that can be solved explicitly for successive approximation, thus avoiding the difficulties of standard methods in high - dimensional and non - convex optimization problems. In addition, this algorithm is applicable not only to Markovian dynamics but also to volatility control problems, which is one of the cases where convergence results are obtained for the first time in the existing literature. ### Main contributions of the paper: 1. **Algorithm innovation**: A new policy iteration algorithm is proposed. Based on the probability theory method, this algorithm can handle non - Markovian control problems and has a natural partial differential equation (PDE) form in the Markovian case. 2. **Convergence speed**: The exponential convergence speed of the value function and control in the standard policy iteration algorithm is recovered, but the proof method is simpler and avoids complex PDE techniques. 3. **Scope of application**: The algorithm is applicable not only to standard stochastic control problems but also to the case where volatility is controlled, which is a particularly difficult problem. Previous research mainly focused on one - dimensional dynamics. 4. **Theoretical basis**: By introducing a vanishing penalty term, the convergence speed of the approximation algorithm is accelerated, so that the value function and the optimal control can be approximated by explicit functions. ### Main results: - **Approximation of the value function**: For each function \(u^*\) that satisfies the conditions, through the defined iteration scheme \((V_n,\hat{Y}_n,Z_n,\alpha_n)_{n\in\mathbb{N}}\), there exists a constant \(C > 0\) such that \[ -\frac{C}{2^n}\leq V_n - V\leq\frac{C}{\phi(n)},\quad\forall n\in\mathbb{N}. \] - **Approximation of the optimal control**: Under stronger conditions, the point - wise convergence rate for each \(t\in[0,T]\) can be obtained. ### Method comparison: - **Differences from the existing literature**: Compared with the existing policy iteration algorithms, the method in this paper does not depend on the relaxation of the control problem and does not need to introduce additional approximation steps. In addition, the method in this paper performs better when dealing with volatility - controlled problems, while the existing methods usually only deal with one - dimensional dynamics. ### Conclusion: This paper provides a new policy iteration algorithm, which is applicable to non - Markovian and volatility - controlled stochastic control problems and has high theoretical and practical application values. Through the probability theory method, this algorithm not only simplifies the proof process but also expands the scope of application and fills the gaps in existing research.

A policy iteration algorithm for non-Markovian control problems

Continuous-Time Distributed Policy Iteration for Multicontroller Nonlinear Systems

Policy Iteration Algorithm for Singular Controlled Diffusion Processes

From Optimization to Control: Quasi Policy Iteration

Value-Gradient Iteration with Quadratic Approximate Value Functions

Policy Iteration for Multiplicative Noise Output Feedback Control

Temporal Difference-Based Policy Iteration for Optimal Control of Stochastic Systems

Approximate Midpoint Policy Iteration for Linear Quadratic Control

Policy Iteration Based Feedback Control

Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games with Application to H-infinity Control

Data-driven policy iteration algorithm for continuous-time stochastic linear-quadratic optimal control problems

A Novel Policy Iteration Algorithm for Nonlinear Continuous-Time H$\infty$ Control Problem

Data-driven Policy Iteration Algorithm for Optimal Control of Continuous-Time Itô Stochastic Systems with Markovian Jumps

Policy Iteration Algorithm for Constrained Cost Optimal Control of Discrete-Time Nonlinear System

Easy Monotonic Policy Iteration

Two‐loop reinforcement learning algorithm for finite‐horizon optimal control of continuous‐time affine nonlinear systems

Approximate Finite-Horizon Optimal Control with Policy Iteration

Policy Iteration Reinforcement Learning Method for Continuous-Time Linear-Quadratic Mean-Field Control Problems

Algorithms for optimization and stabilization of controlled Markov chains

Adaptive Optimal Control for a Class of Continuous-Time Affine Nonlinear Systems with Unknown Internal Dynamics