Online Linear Quadratic Tracking with Regret Guarantees

Aren Karapetyan,Diego Bolliger,Anastasios Tsiamis,Efe C. Balta,John Lygeros
DOI: https://doi.org/10.1109/LCSYS.2023.3345809
2024-10-17
Abstract:Online learning algorithms for dynamical systems provide finite time guarantees for control in the presence of sequentially revealed cost functions. We pose the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input. We show the equivalence of this problem to the control of linear systems subject to adversarial disturbances and propose a novel online gradient descent based algorithm to achieve efficient tracking in finite time. We provide a dynamic regret upper bound scaling linearly with the path length of the reference trajectory and a numerical example to corroborate the theoretical guarantees.
Systems and Control
What problem does this paper attempt to address?
The paper attempts to address the online Linear Quadratic Tracking (LQT) problem, particularly in scenarios where the reference trajectory is unknown and revealed gradually. Specifically, the paper reformulates the classical LQT problem as an online optimization problem, where the time-varying reference state is revealed only after the control input is applied. The authors propose a new algorithm based on Online Gradient Descent (OGD) — Stable State Online Gradient Descent (SS-OGD) — to achieve efficient finite-time tracking and provide an upper bound analysis of dynamic regret. ### Main Contributions of the Paper: 1. **Problem Reformulation**: Reformulating the LQT problem into an online optimization problem, dealing with unknown and gradually revealed reference trajectories. 2. **New Algorithm**: Proposing the SS-OGD algorithm, which adapts the traditional OGD algorithm to dynamic systems. 3. **Performance Analysis**: Providing an upper bound analysis of dynamic regret, which is linearly related to the path length of the reference trajectory. 4. **Numerical Validation**: Validating the theoretical results through numerical experiments, demonstrating the effectiveness of the SS-OGD algorithm. ### Key Concepts: - **Online Linear Quadratic Tracking (LQT)**: The goal is to minimize the tracking error of the system in the presence of a gradually revealed reference trajectory. - **Dynamic Regret**: Measures the gap between the accumulated cost of the algorithm and the optimal benchmark cost over a finite time. - **Path Length**: Used to quantify the complexity of changes in the reference trajectory; the higher the path length, the more drastic the changes in the reference trajectory. - **Stable State Online Gradient Descent (SS-OGD)**: An improved online gradient descent algorithm capable of efficient tracking in dynamic systems. ### Background and Motivation: - **Traditional LQT Problem**: Assumes the reference trajectory is known and fixed, applicable to many real-world scenarios such as aircraft trajectory tracking and industrial process control. - **Need for Online Control**: In some scenarios, the reference trajectory is unknown and revealed gradually, necessitating the development of algorithms that can adapt to such dynamic environments. ### Methods and Results: - **Algorithm Design**: The SS-OGD algorithm introduces a stable state feedback term, correcting the deficiencies of the traditional OGD algorithm, resulting in better performance in dynamic systems. - **Theoretical Analysis**: Proves that the dynamic regret upper bound of the SS-OGD algorithm is linearly related to the path length of the reference trajectory, ensuring the algorithm's effectiveness. - **Experimental Validation**: Demonstrates the superior performance of the SS-OGD algorithm under different reference trajectories through numerical experiments with a quadrotor UAV model. ### Conclusion: The paper successfully addresses the online LQT problem, especially when the reference trajectory is unknown and revealed gradually. The proposed SS-OGD algorithm not only has good theoretical performance guarantees but also performs well in practical applications. Future research directions include further optimizing the regret coefficient and dealing with reference trajectories generated by unknown dynamics.