Abstract:SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 135-166, February 2024. This work uses the entropy-regularized relaxed stochastic control perspective as a principled framework for designing reinforcement learning (RL) algorithms. Herein, an agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. The noisy policies, on the one hand, explore the space and hence facilitate learning, but, on the other hand, they introduce bias by assigning a positive probability to nonoptimal actions. This exploration-exploitation trade-off is determined by the strength of entropy regularization. We study algorithms resulting from two entropy regularization formulations: the exploratory control approach, where entropy is added to the cost objective, and the proximal policy update approach, where entropy penalizes policy divergence between consecutive episodes. We focus on the finite horizon continuous-time linear-quadratic (LQ) RL problem, where a linear dynamics with unknown drift coefficients is controlled subject to quadratic costs. In this setting, both algorithms yield a Gaussian relaxed policy. We quantify the precise difference between the value functions of a Gaussian policy and its noisy evaluation and show that the execution noise must be independent across time. By tuning the frequency of sampling from relaxed policies and the parameter governing the strength of entropy regularization, we prove that the regret, for both learning algorithms, is of the order [math] (up to a logarithmic factor) over [math] episodes, matching the best known result from the literature.

Regularity and stability of feedback relaxed controls

Direct control method for improving stability and reliability of nonlinear stochastic dynamical systems

Backward Stochastic Control System with Entropy Regularization

Exploratory Optimal Stopping: A Singular Control Formulation

Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning

Continuous Control of Conservatism for Robust Optimization by Adjustable Regret

Pathwise Relaxed Optimal Control of Rough Differential Equations

Optimal scheduling of entropy regulariser for continuous-time linear-quadratic reinforcement learning

Global Finite-Time Output-Feedback Stabilization of Nonlinear Systems Under Relaxed Conditions

Control Regularization for Reduced Variance Reinforcement Learning

Benign Nonconvex Landscapes in Optimal and Robust Control, Part I: Global Optimality

Online Stackelberg Optimization via Nonlinear Control

On the stability of Lipschitz continuous control problems and its application to reinforcement learning

Optimal control of conditioned processes with feedback controls

Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems

Relaxation of Optimal Control Problem Governed by Semilinear Elliptic Equation with Leading Term Containing Controls

Regularity Properties of Optimization-Based Controllers

Safe Non-Stochastic Control of Linear Dynamical Systems

Existence and Nonexistence Results of an Optimal Control Problem by Using Relaxed Control

Robust Stabilization and H∞ Control for Stochastic Systems with Parameter Uncertainty and Nonlinearity

Stabilization of Stochastic Nonholonomic Systems