Handling Heterogeneous Curvatures in Bandit LQR Control

Yu-Hu Yan,Jing Wang,Peng Zhao
2024-01-01
Abstract:We investigate online Linear Quadratic Regulator (LQR) with bandit feedback and semi-adversarial disturbances. Previous works assume costs withhomogeneouscurvatures (i.e., with a uniform strong convexity lower bound), which can be hard to satisfy in many real scenarios and prohibits adapting to true curvatures for better performance. In this paper, we initiate the study of bandit LQR control withheterogeneouscost curvatures, aiming to strengthen the algorithm's adaptivity. To achieve this, we reduce the problem to bandit convex optimization with memory via a ``with-history'' reduction to avoid hard-to-control truncation errors. Then we provide a novel analysis for an importantstabilityterm that appeared in both regret and memory, usingNewton decrementdeveloped in interior-point methods. The analysis enables us to guarantee memory-related terms introduced in the reduction and also provide a simplified analysis for handling heterogeneous curvatures in bandit convex optimization. Finally, we achieve interpolated guarantees that can not only recover existing bounds for convex and quadratic costs but also attain new implications for cases of corrupted and decaying quadraticity.
What problem does this paper attempt to address?