Abstract:Direct policy search has achieved great empirical success in reinforcement learning. Many recent studies have revisited its theoretical foundation for continuous control, which reveals elegant nonconvex geometry in various benchmark problems, especially in fully observable state-feedback cases. This paper considers two fundamental optimal and robust control problems with partial observability: the Linear Quadratic Gaussian (LQG) control with stochastic noises, and $\mathcal{H}_\infty$ robust control with adversarial noises. In the policy space, the former problem is smooth but nonconvex, while the latter one is nonsmooth and nonconvex. We highlight some interesting and surprising ``discontinuity'' of LQG and $\mathcal{H}_\infty$ cost functions around the boundary of their domains. Despite the lack of convexity (and possibly smoothness), our main results show that for a class of non-degenerate policies, all Clarke stationary points are globally optimal and there is no spurious local minimum for both LQG and $\mathcal{H}_\infty$ control. Our proof techniques rely on a new and unified framework of Extended Convex Lifting (ECL), which reconciles the gap between nonconvex policy optimization and convex reformulations. This ECL framework is of independent interest, and we will discuss its details in Part II of this paper.

What problem does this paper attempt to address?

This paper attempts to solve the non - convex optimization problems in two fundamental optimal and robust control problems with partial observability, specifically including: 1. **Linear - Quadratic - Gaussian (LQG) Control**: How to achieve global optimal control through dynamic policy optimization in the presence of random noise. 2. **H∞ Robust Control**: How to achieve global optimal control through dynamic policy optimization in the presence of adversarial noise. ### Problem Background In classical control theory, a common method for dealing with non - convex problems is to re - parameterize the problem (for example, through appropriate variable transformation) and transform it into a convex form so that it can be solved using efficient algorithms. However, when the system model is unknown, complex, or insufficiently parameterized, direct policy optimization becomes another viable controller design option. Although direct policy optimization is conceptually simpler, computationally more flexible, and more suitable for learning - based control, it naturally leads to non - convex optimization problems, which makes it difficult to provide theoretical guarantees or certificates for it. ### Paper Contributions The main contribution of the paper lies in studying the non - convex optimization landscapes of LQG control and H∞ robust control from the perspective of modern non - convex optimization, and proving that for a large class of non - degenerate dynamic policies, all stable points are globally optimal and there are no spurious local minima. Specific contributions are as follows: 1. **Smooth Non - convex Landscape of LQG Control**: - Although the LQG cost function is analytic within its domain, the paper reveals that it exhibits complex "discontinuous" behavior near the boundary. - It is proved that if the limit policy corresponds to a controllable and observable controller, the corresponding LQG cost will always diverge to infinity. - The main technical result shows that for the non - degenerate policy class, any stable point is globally optimal in LQG control and there are no spurious stable points. 2. **Non - smooth Non - convex Landscape of H∞ Robust Control**: - The cost function of H∞ robust control is non - smooth, which increases the complexity of the analysis. - Nevertheless, the paper shows that many landscape properties of LQG control also have non - smooth counterparts in H∞ control. - It is proved that all Clarke stable points in the set of non - degenerate policies are globally optimal in H∞ robust control, so there are no spurious stable points in the set of non - degenerate policies. ### Technical Framework To bridge the gap between non - convex policy optimization and convex reconstruction, the paper introduces a new unified framework - Extended Convex Lifting (ECL). This framework utilizes Lyapunov variables and similarity transformations and is applicable to a wide range of control problems, including smooth and non - smooth control problems. ### Related Work The paper reviews relevant literature, including policy optimization of LQ control, direct policy optimization of non - smooth robust control, and benign non - convex landscapes in machine learning. These studies provide important theoretical foundations and technical tools for understanding non - convex optimization problems. ### Conclusion Through the ECL framework, the paper provides in - depth theoretical analysis of the non - convex optimization landscapes of LQG control and H∞ robust control, revealing the global optimal properties of these control problems under dynamic policies. These results are not only of great significance to control theory but also provide a new perspective for research in the fields of modern reinforcement learning and non - convex optimization.

Benign Nonconvex Landscapes in Optimal and Robust Control, Part I: Global Optimality

Benign Nonconvex Landscapes in Optimal and Robust Control, Part II: Extended Convex Lifting

On the Global Optimality of Direct Policy Search for Nonsmooth $H_\infty$ Output-Feedback Control

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

Global Convergence of Direct Policy Search for State-Feedback $\mathcal{H}_\infty$ Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential

Policy Optimization in Control: Geometry and Algorithmic Implications

Analysis of the Optimization Landscape of Linear Quadratic Gaussian (LQG) Control

On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control

Robust Policy Optimization in Continuous-time Mixed $\mathcal{H}_2/\mathcal{H}_\infty$ Stochastic Control

Lmi Approach To Optimal Guaranteed Cost Control For A Class Of Linear Uncertain Discrete Systems

Safe Non-Stochastic Control of Control-Affine Systems: An Online Convex Optimization Approach

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

Convexity and monotonicity in nonlinear optimal control under uncertainty

Optimization Landscape of Policy Gradient Methods for Discrete-Time Static Output Feedback

Convexity in Optimal Control Problems

Robust affine control of linear stochastic systems

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators