Abstract:In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning.

Data-Driven Policy Gradient Method for Optimal Output Feedback Control of LQR

On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

Globally Convergent Policy Gradient Methods for Linear Quadratic Control of Partially Observed Systems

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators

Stability-Certified On-Policy Data-Driven LQR via Recursive Learning and Policy Gradient

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

Global Convergence of Policy Gradient Primal-dual Methods for Risk-constrained LQRs

Policy Gradient-based Model Free Optimal LQG Control with a Probabilistic Risk Constraint

Interactions of salts and denaturing agents with a polyacrylamide gel.

On the Global Optimality of Direct Policy Search for Nonsmooth $H_\infty$ Output-Feedback Control

Data-Driven LQR with Finite-Time Experiments via Extremum-Seeking Policy Iteration

Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States