A policy iteration algorithm for non-Markovian control problems

Dylan Possamaï,Ludovic Tangpi
2024-09-06
Abstract:In this paper, we propose a new policy iteration algorithm to compute the value function and the optimal controls of continuous time stochastic control problems. The algorithm relies on successive approximations using linear-quadratic control problems which can all be solved explicitly, and only require to solve recursively linear PDEs in the Markovian case. Though our procedure fails in general to produce a non-decreasing sequence like the standard algorithm, it can be made arbitrarily close to being monotone. More importantly, we recover the standard exponential speed of convergence for both the value and the controls, through purely probabilistic arguments which are significantly simpler than in the classical case. Our proof also accommodates non-Markovian dynamics as well as volatility control, allowing us to obtain the first convergence results in the latter case for a state process in multi-dimensions.
Optimization and Control,Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to calculate the value function and the optimal control strategy in non - Markovian control problems. Specifically, the author proposes a new policy iteration algorithm for solving the value function and the optimal control in continuous - time stochastic control problems. This algorithm uses a linear - quadratic control problem that can be solved explicitly for successive approximation, thus avoiding the difficulties of standard methods in high - dimensional and non - convex optimization problems. In addition, this algorithm is applicable not only to Markovian dynamics but also to volatility control problems, which is one of the cases where convergence results are obtained for the first time in the existing literature. ### Main contributions of the paper: 1. **Algorithm innovation**: A new policy iteration algorithm is proposed. Based on the probability theory method, this algorithm can handle non - Markovian control problems and has a natural partial differential equation (PDE) form in the Markovian case. 2. **Convergence speed**: The exponential convergence speed of the value function and control in the standard policy iteration algorithm is recovered, but the proof method is simpler and avoids complex PDE techniques. 3. **Scope of application**: The algorithm is applicable not only to standard stochastic control problems but also to the case where volatility is controlled, which is a particularly difficult problem. Previous research mainly focused on one - dimensional dynamics. 4. **Theoretical basis**: By introducing a vanishing penalty term, the convergence speed of the approximation algorithm is accelerated, so that the value function and the optimal control can be approximated by explicit functions. ### Main results: - **Approximation of the value function**: For each function \(u^*\) that satisfies the conditions, through the defined iteration scheme \((V_n,\hat{Y}_n,Z_n,\alpha_n)_{n\in\mathbb{N}}\), there exists a constant \(C > 0\) such that \[ -\frac{C}{2^n}\leq V_n - V\leq\frac{C}{\phi(n)},\quad\forall n\in\mathbb{N}. \] - **Approximation of the optimal control**: Under stronger conditions, the point - wise convergence rate for each \(t\in[0,T]\) can be obtained. ### Method comparison: - **Differences from the existing literature**: Compared with the existing policy iteration algorithms, the method in this paper does not depend on the relaxation of the control problem and does not need to introduce additional approximation steps. In addition, the method in this paper performs better when dealing with volatility - controlled problems, while the existing methods usually only deal with one - dimensional dynamics. ### Conclusion: This paper provides a new policy iteration algorithm, which is applicable to non - Markovian and volatility - controlled stochastic control problems and has high theoretical and practical application values. Through the probability theory method, this algorithm not only simplifies the proof process but also expands the scope of application and fills the gaps in existing research.