Abstract:SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 135-166, February 2024. This work uses the entropy-regularized relaxed stochastic control perspective as a principled framework for designing reinforcement learning (RL) algorithms. Herein, an agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. The noisy policies, on the one hand, explore the space and hence facilitate learning, but, on the other hand, they introduce bias by assigning a positive probability to nonoptimal actions. This exploration-exploitation trade-off is determined by the strength of entropy regularization. We study algorithms resulting from two entropy regularization formulations: the exploratory control approach, where entropy is added to the cost objective, and the proximal policy update approach, where entropy penalizes policy divergence between consecutive episodes. We focus on the finite horizon continuous-time linear-quadratic (LQ) RL problem, where a linear dynamics with unknown drift coefficients is controlled subject to quadratic costs. In this setting, both algorithms yield a Gaussian relaxed policy. We quantify the precise difference between the value functions of a Gaussian policy and its noisy evaluation and show that the execution noise must be independent across time. By tuning the frequency of sampling from relaxed policies and the parameter governing the strength of entropy regularization, we prove that the regret, for both learning algorithms, is of the order [math] (up to a logarithmic factor) over [math] episodes, matching the best known result from the literature.

Exploratory Optimal Stopping: A Singular Control Formulation

Randomized Optimal Stopping Problem in Continuous Time and Reinforcement Learning Algorithm

Learning to Optimally Stop a Diffusion Process

From the Optimal Singular Stochastic Control to the Optimal Stopping for Regime-Switching Processes.

Randomized Policy Optimization for Optimal Stopping

Sequential Design for Optimal Stopping Problems

The Research on Singular Stochastic Control Problem with Stopping

Data-driven optimal stopping: A pure exploration analysis

Optimal Stopping Problems with Restricted Stopping Times

Robust optimal stopping with regime switching

Characterization of Stochastic Control with Optimal Stopping in a Sobolev Space

The Research on a Class of Optimal Control Strategy Problem with Stopping

Backward Stochastic Control System with Entropy Regularization

A Problem of Singular Stochastic Control with Optimal Stopping in Finite Horizon

Optimal Stopping under Model Ambiguity: a Time-Consistent Equilibrium Approach

Regularity and stability of feedback relaxed controls

A Nonparametric Algorithm for Optimal Stopping Based on Robust Optimization

Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning

On an Optimal Stopping Problem with a Discontinuous Reward

Deep Reinforcement Learning for Optimal Stopping with Application in Financial Engineering