Deep Penalty Methods: A Class of Deep Learning Algorithms for Solving High Dimensional Optimal Stopping Problems

Yunfei Peng,Pengyu Wei,Wei Wei
2024-05-19
Abstract:We propose a deep learning algorithm for high dimensional optimal stopping problems. Our method is inspired by the penalty method for solving free boundary PDEs. Within our approach, the penalized PDE is approximated using the Deep BSDE framework proposed by \cite{weinan2017deep}, which leads us to coin the term "Deep Penalty Method (DPM)" to refer to our algorithm. We show that the error of the DPM can be bounded by the loss function and $O(\frac{1}{\lambda})+O(\lambda h) +O(\sqrt{h})$, where $h$ is the step size in time and $\lambda$ is the penalty parameter. This finding emphasizes the need for careful consideration when selecting the penalization parameter and suggests that the discretization error converges at a rate of order $\frac{1}{2}$. We validate the efficacy of the DPM through numerical tests conducted on a high-dimensional optimal stopping model in the area of American option pricing. The numerical tests confirm both the accuracy and the computational efficiency of our proposed algorithm.
Mathematical Finance,Computational Finance
What problem does this paper attempt to address?
This paper mainly discusses the deep penalization method in high-dimensional optimal stopping problems, such as American option pricing. It is an algorithm that combines deep learning and penalization methods. The paper proposes a new deep learning algorithm called "Deep Penalization Method (DPM)", which is inspired by penalization methods for solving free boundary partial differential equations (PDEs). DPM approximates the optimal stopping problem in continuous time by randomizing the stopping time, thereby avoiding the accumulation of optimization errors. In the error analysis, the authors prove that the error of DPM can be constrained by the loss function and terms related to the penalization parameter λ and time step size h. They find that the choice of λ and h needs to be carefully balanced to control the discretization error, and they point out that when λ=1/√h, the convergence rate of the discretization error is O(√h). The paper verifies the effectiveness and accuracy of DPM in high-dimensional (up to 100 dimensions) American index option pricing problems through numerical experiments. The paper also reviews other methods for solving optimal stopping problems, such as binary trees, penalization methods, policy iteration, least squares Monte Carlo, and random grid methods, and points out that these methods become impractical in high-dimensional problems. In contrast, deep learning, especially neural network-based algorithms such as Deep Backward Stochastic Differential Equation (Deep BSDE) methods, have shown potential in handling high-dimensional dynamic models. In summary, this paper aims to address the computational challenges of high-dimensional optimal stopping problems and provides a new numerical algorithm by introducing the deep penalization method. This algorithm can effectively reduce optimization errors and has demonstrated good performance both theoretically and practically.