Abstract:We develop a new continuous-time stochastic gradient descent method for optimizing over the stationary distribution of stochastic differential equation (SDE) models. The algorithm continuously updates the SDE model's parameters using an estimate for the gradient of the stationary distribution. The gradient estimate is simultaneously updated using forward propagation of the SDE state derivatives, asymptotically converging to the direction of steepest descent. We rigorously prove convergence of the online forward propagation algorithm for linear SDE models (i.e., the multi-dimensional Ornstein-Uhlenbeck process) and present its numerical results for nonlinear examples. The proof requires analysis of the fluctuations of the parameter evolution around the direction of steepest descent. Bounds on the fluctuations are challenging to obtain due to the online nature of the algorithm (e.g., the stationary distribution will continuously change as the parameters change). We prove bounds for the solutions of a new class of Poisson partial differential equations (PDEs), which are then used to analyze the parameter fluctuations in the algorithm. Our algorithm is applicable to a range of mathematical finance applications involving statistical calibration of SDE models and stochastic optimal control for long time horizons where ergodicity of the data and stochastic process is a suitable modeling framework. Numerical examples explore these potential applications, including learning a neural network control for high-dimensional optimal control of SDEs and training stochastic point process models of limit order book events.

Infinite-horizon gradient estimation for semi-Markov decision processes

A Basic Formula for Performance Gradient Estimation of Semi-Markov Decision Processes

Infinite-Horizon Policy-Gradient Estimation with Variable Discount Factor for Markov Decision Process

On-Line Policy Gradient Estimation with Multi-Step Sampling.

The policy gradient estimation of continuous-time hidden Markov decision processes

An improvement of policy gradient estimation algorithms

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

Two Time-Scale Gradient Approximation Algorithm For Adaptive Markov Reward Processes

Continuous-time stochastic gradient descent for optimizing over the stationary distribution of stochastic differential equations

Two-Timescale Simulation-based Algorithm for Markov Decision Process Based on Performance Potentials

Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

On performance potentials and conditional Monte Carlo for gradient estimationfor Markov chains

The Policy Gradient Estimation for Continuous-Time Partially Observable Markovian Decision Processes

A unified approach for semi-Markov decision processes with discounted and average reward criteria

On Markov Chain Gradient Descent

Recursive Approaches for Single Sample Path Based Markov Reward Processes

An Efficient High-Dimensional Gradient Estimator for Stochastic Differential Equations

An Inverse Reinforcement Learning Algorithm for Semi-Markov Decision Processes

Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Finite-horizon Optimality for Continuous-Time Markov Decision Processes with Unbounded Transition Rates

Parametric estimation of stochastic differential equations via online gradient descent