Benign Nonconvex Landscapes in Optimal and Robust Control, Part I: Global Optimality

Yang Zheng,Chih-fan Pai,Yujie Tang
2023-12-24
Abstract:Direct policy search has achieved great empirical success in reinforcement learning. Many recent studies have revisited its theoretical foundation for continuous control, which reveals elegant nonconvex geometry in various benchmark problems, especially in fully observable state-feedback cases. This paper considers two fundamental optimal and robust control problems with partial observability: the Linear Quadratic Gaussian (LQG) control with stochastic noises, and $\mathcal{H}_\infty$ robust control with adversarial noises. In the policy space, the former problem is smooth but nonconvex, while the latter one is nonsmooth and nonconvex. We highlight some interesting and surprising ``discontinuity'' of LQG and $\mathcal{H}_\infty$ cost functions around the boundary of their domains. Despite the lack of convexity (and possibly smoothness), our main results show that for a class of non-degenerate policies, all Clarke stationary points are globally optimal and there is no spurious local minimum for both LQG and $\mathcal{H}_\infty$ control. Our proof techniques rely on a new and unified framework of Extended Convex Lifting (ECL), which reconciles the gap between nonconvex policy optimization and convex reformulations. This ECL framework is of independent interest, and we will discuss its details in Part II of this paper.
Optimization and Control,Systems and Control,Dynamical Systems
What problem does this paper attempt to address?
This paper attempts to solve the non - convex optimization problems in two fundamental optimal and robust control problems with partial observability, specifically including: 1. **Linear - Quadratic - Gaussian (LQG) Control**: How to achieve global optimal control through dynamic policy optimization in the presence of random noise. 2. **H∞ Robust Control**: How to achieve global optimal control through dynamic policy optimization in the presence of adversarial noise. ### Problem Background In classical control theory, a common method for dealing with non - convex problems is to re - parameterize the problem (for example, through appropriate variable transformation) and transform it into a convex form so that it can be solved using efficient algorithms. However, when the system model is unknown, complex, or insufficiently parameterized, direct policy optimization becomes another viable controller design option. Although direct policy optimization is conceptually simpler, computationally more flexible, and more suitable for learning - based control, it naturally leads to non - convex optimization problems, which makes it difficult to provide theoretical guarantees or certificates for it. ### Paper Contributions The main contribution of the paper lies in studying the non - convex optimization landscapes of LQG control and H∞ robust control from the perspective of modern non - convex optimization, and proving that for a large class of non - degenerate dynamic policies, all stable points are globally optimal and there are no spurious local minima. Specific contributions are as follows: 1. **Smooth Non - convex Landscape of LQG Control**: - Although the LQG cost function is analytic within its domain, the paper reveals that it exhibits complex "discontinuous" behavior near the boundary. - It is proved that if the limit policy corresponds to a controllable and observable controller, the corresponding LQG cost will always diverge to infinity. - The main technical result shows that for the non - degenerate policy class, any stable point is globally optimal in LQG control and there are no spurious stable points. 2. **Non - smooth Non - convex Landscape of H∞ Robust Control**: - The cost function of H∞ robust control is non - smooth, which increases the complexity of the analysis. - Nevertheless, the paper shows that many landscape properties of LQG control also have non - smooth counterparts in H∞ control. - It is proved that all Clarke stable points in the set of non - degenerate policies are globally optimal in H∞ robust control, so there are no spurious stable points in the set of non - degenerate policies. ### Technical Framework To bridge the gap between non - convex policy optimization and convex reconstruction, the paper introduces a new unified framework - Extended Convex Lifting (ECL). This framework utilizes Lyapunov variables and similarity transformations and is applicable to a wide range of control problems, including smooth and non - smooth control problems. ### Related Work The paper reviews relevant literature, including policy optimization of LQ control, direct policy optimization of non - smooth robust control, and benign non - convex landscapes in machine learning. These studies provide important theoretical foundations and technical tools for understanding non - convex optimization problems. ### Conclusion Through the ECL framework, the paper provides in - depth theoretical analysis of the non - convex optimization landscapes of LQG control and H∞ robust control, revealing the global optimal properties of these control problems under dynamic policies. These results are not only of great significance to control theory but also provide a new perspective for research in the fields of modern reinforcement learning and non - convex optimization.