Abstract:Linear-quadratic regulator (LQR) is a landmark problem in the field of optimal control, which is the concern of this paper. Generally, LQR is classified into state-feedback LQR (SLQR) and output-feedback LQR (OLQR) based on whether the full state is obtained. It has been suggested in existing literature that both SLQR and OLQR could be viewed as \textit{constrained nonconvex matrix optimization} problems in which the only variable to be optimized is the feedback gain matrix. In this paper, we introduce a first-order accelerated optimization framework of handling the LQR problem, and give its convergence analysis for the cases of SLQR and OLQR, respectively. Specifically, a Lipschiz Hessian property of LQR performance criterion is presented, which turns out to be a crucial property for the application of modern optimization techniques. For the SLQR problem, a continuous-time hybrid dynamic system is introduced, whose solution trajectory is shown to converge exponentially to the optimal feedback gain with Nesterov-optimal order $1-\frac{1}{\sqrt{\kappa}}$ ($\kappa$ the condition number). Then, the symplectic Euler scheme is utilized to discretize the hybrid dynamic system, and a Nesterov-type method with a restarting rule is proposed that preserves the continuous-time convergence rate, i.e., the discretized algorithm admits the Nesterov-optimal convergence order. For the OLQR problem, a Hessian-free accelerated framework is proposed, which is a two-procedure method consisting of semiconvex function optimization and negative curvature exploitation. In a time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an $\epsilon$-stationary point of the performance criterion; this entails that the method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla gradient descent. Moreover, our method provides the second-order guarantee of stationary point.

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

Stochastic Convergence Results for Regularized Actor-Critic Methods

On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control

Error Controlled Actor-Critic

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence

Relaxed Actor-Critic with Convergence Guarantees for Continuous-Time Optimal Control of Nonlinear Systems.

Online actor‐critic learning control with self‐triggered mechanism for nonlinear regulation problems

Sparse Online Kernelized Actor-Critic Learning in Reproducing Kernel Hilbert Space

A model-free first-order method for linear quadratic regulator with $\tilde{O}(1/\varepsilon)$ sampling complexity

Accelerated Optimization Landscape of Linear-Quadratic Regulator

On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms

Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation

Actor-Critic Method to Solve the Linear Quadratic Problem of Markov Jump Linear System

Convergence and Robustness of Value and Policy Iteration for the Linear Quadratic Regulator

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality