Abstract:SIAM Journal on Scientific Computing, Volume 46, Issue 5, Page C535-C556, October 2024. We present a neural network approach for approximating the value function of high-dimensional stochastic control problems. Our training process simultaneously updates our value function estimate and identifies the part of the state space likely to be visited by optimal trajectories. Our approach leverages insights from optimal control theory and the fundamental relation between semilinear parabolic partial differential equations and forward-backward stochastic differential equations. To focus the sampling on relevant states during neural network training, we use the stochastic Pontryagin maximum principle (PMP) to obtain the optimal controls for the current value function estimate. By design, our approach coincides with the method of characteristics for the nonviscous Hamilton–Jacobi–Bellman equation arising in deterministic control problems. Our training loss consists of a weighted sum of the objective functional of the control problem and penalty terms that enforce the HJB equations along the sampled trajectories. Importantly, training is unsupervised in that it does not require solutions of the control problem. Our numerical experiments highlight our scheme's ability to identify the relevant parts of the state space and produce meaningful value estimates. Using a two-dimensional model problem, we demonstrate the importance of the stochastic PMP to inform the sampling and compare it to a finite element approach. With a nonlinear control affine quadcopter example, we illustrate that our approach can handle complicated dynamics. For a 100-dimensional benchmark problem, we demonstrate that our approach improves accuracy and time-to-solution, and, via a modification, we show the wider applicability of our scheme. Reproducibility of computational results.This paper has been awarded the "SIAM Reproducibility Badge: Code and data available" as recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/EmoryMLIP/NeuralSOC and in the supplementary material (NeuralSOC-main.zip [ 29.9MB]).

Potential Based Policy Gradient Approach for Optimal Control of the Stochastic System with Unknown Noise

Optimal Parametric Control of Nonlinear Random Vibrating Systems

Near Optimal Control for a Class of Stochastic Hybrid Systems.

A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

Policy Gradient Adaptive Dynamic Programming for Model-Free Multi-Objective Optimal Control

Policy Iteration Based Feedback Control

Model-free Optimal Control of Discrete-Time Systems with Additive and Multiplicative Noises

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Distributed Optimal Control of Nonlinear System Based on Policy Gradient with External Disturbance

Sparse optimal control of networks with multiplicative noise via policy gradient

Dynamic Programming-based Approximate Optimal Control for Model-Based Reinforcement Learning

Deterministic policy gradient based optimal control with probabilistic constraints

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Learning Optimal Control Policy for Unknown Discrete-Time Systems

Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems with Trajectory-Based Initial Control Policy

A Neural Network Approach for Stochastic Optimal Control

Policy gradient methods for discrete time linear quadratic regulator with random parameters

Dynamic Event-Triggered Prescribed Performance Control for Partially Unknown Nonlinear System via Adaptive Dynamic Programming

Data-Based Predictive Control Via Multistep Policy Gradient Reinforcement Learning

Neural Stochastic Control

Online Adaptive Optimization Algorithm for Semi-Markov Control Processes