Abstract:Linear-quadratic regulator (LQR) is a landmark problem in the field of optimal control, which is the concern of this paper. Generally, LQR is classified into state-feedback LQR (SLQR) and output-feedback LQR (OLQR) based on whether the full state is obtained. It has been suggested in existing literature that both SLQR and OLQR could be viewed as \textit{constrained nonconvex matrix optimization} problems in which the only variable to be optimized is the feedback gain matrix. In this paper, we introduce a first-order accelerated optimization framework of handling the LQR problem, and give its convergence analysis for the cases of SLQR and OLQR, respectively. Specifically, a Lipschiz Hessian property of LQR performance criterion is presented, which turns out to be a crucial property for the application of modern optimization techniques. For the SLQR problem, a continuous-time hybrid dynamic system is introduced, whose solution trajectory is shown to converge exponentially to the optimal feedback gain with Nesterov-optimal order $1-\frac{1}{\sqrt{\kappa}}$ ($\kappa$ the condition number). Then, the symplectic Euler scheme is utilized to discretize the hybrid dynamic system, and a Nesterov-type method with a restarting rule is proposed that preserves the continuous-time convergence rate, i.e., the discretized algorithm admits the Nesterov-optimal convergence order. For the OLQR problem, a Hessian-free accelerated framework is proposed, which is a two-procedure method consisting of semiconvex function optimization and negative curvature exploitation. In a time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an $\epsilon$-stationary point of the performance criterion; this entails that the method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla gradient descent. Moreover, our method provides the second-order guarantee of stationary point.

Handling Heterogeneous Curvatures in Bandit LQR Control

Tight Rates for Bandit Control Beyond Quadratics

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

Regret Analysis of Policy Optimization over Submanifolds for Linearly Constrained Online LQG

Observation-based Optimal Control Law Learning with LQR Reconstruction

Second Order Methods for Bandit Optimization and Control

The Power of Linear Controllers in LQR Control

Direct Data-Driven Discounted Infinite Horizon Linear Quadratic Regulator with Robustness Guarantees

Regret-Optimal LQR Control

Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control

Asynchronous Heterogeneous Linear Quadratic Regulator Design

Revisiting LQR Control from the Perspective of Receding-Horizon Policy Gradient

LQR Control with Sparse Adversarial Disturbances

Accelerated Optimization Landscape of Linear-Quadratic Regulator

Stochastic Linear Quadratic Regulators with Indefinite Control Weight Costs. II

Stronger Regret Bounds for Safe Online Reinforcement Learning in the Linear Quadratic Regulator

Infinite-horizon Risk-constrained Linear Quadratic Regulator with Average Cost

Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

Benign Nonconvex Landscapes in Optimal and Robust Control, Part I: Global Optimality

On Irregular Linear Quadratic Control: Stochastic Case