Randomized Optimal Stopping Problem in Continuous Time and Reinforcement Learning Algorithm

Yuchao Dong
DOI: https://doi.org/10.1137/22m1516725
IF: 2.2
2024-06-12
SIAM Journal on Control and Optimization
Abstract:SIAM Journal on Control and Optimization, Volume 62, Issue 3, Page 1590-1614, June 2024. In this paper, we study the optimal stopping problem in the so-called exploratory framework, in which the agent takes actions randomly conditioning on the current state and a regularization term is added to the reward functional. Such a transformation reduces the optimal stopping problem to a standard optimal control problem. For the American put option model, we derive the related HJB equation and prove its solvability. Furthermore, we give a convergence rate of policy iteration and compare our solution to the classical American put option problem. Our results indicate a trade-off between the convergence rate and bias in the choice of the temperature constant. Based on the theoretical analysis, a reinforcement learning algorithm is designed and numerical results are demonstrated for several models.
mathematics, applied,automation & control systems
What problem does this paper attempt to address?