Abstract:Online learning algorithms for dynamical systems provide finite time guarantees for control in the presence of sequentially revealed cost functions. We pose the classical linear quadratic tracking problem in the framework of online optimization where the time-varying reference state is unknown a priori and is revealed after the applied control input. We show the equivalence of this problem to the control of linear systems subject to adversarial disturbances and propose a novel online gradient descent based algorithm to achieve efficient tracking in finite time. We provide a dynamic regret upper bound scaling linearly with the path length of the reference trajectory and a numerical example to corroborate the theoretical guarantees.

What problem does this paper attempt to address?

The paper attempts to address the online Linear Quadratic Tracking (LQT) problem, particularly in scenarios where the reference trajectory is unknown and revealed gradually. Specifically, the paper reformulates the classical LQT problem as an online optimization problem, where the time-varying reference state is revealed only after the control input is applied. The authors propose a new algorithm based on Online Gradient Descent (OGD) — Stable State Online Gradient Descent (SS-OGD) — to achieve efficient finite-time tracking and provide an upper bound analysis of dynamic regret. ### Main Contributions of the Paper: 1. **Problem Reformulation**: Reformulating the LQT problem into an online optimization problem, dealing with unknown and gradually revealed reference trajectories. 2. **New Algorithm**: Proposing the SS-OGD algorithm, which adapts the traditional OGD algorithm to dynamic systems. 3. **Performance Analysis**: Providing an upper bound analysis of dynamic regret, which is linearly related to the path length of the reference trajectory. 4. **Numerical Validation**: Validating the theoretical results through numerical experiments, demonstrating the effectiveness of the SS-OGD algorithm. ### Key Concepts: - **Online Linear Quadratic Tracking (LQT)**: The goal is to minimize the tracking error of the system in the presence of a gradually revealed reference trajectory. - **Dynamic Regret**: Measures the gap between the accumulated cost of the algorithm and the optimal benchmark cost over a finite time. - **Path Length**: Used to quantify the complexity of changes in the reference trajectory; the higher the path length, the more drastic the changes in the reference trajectory. - **Stable State Online Gradient Descent (SS-OGD)**: An improved online gradient descent algorithm capable of efficient tracking in dynamic systems. ### Background and Motivation: - **Traditional LQT Problem**: Assumes the reference trajectory is known and fixed, applicable to many real-world scenarios such as aircraft trajectory tracking and industrial process control. - **Need for Online Control**: In some scenarios, the reference trajectory is unknown and revealed gradually, necessitating the development of algorithms that can adapt to such dynamic environments. ### Methods and Results: - **Algorithm Design**: The SS-OGD algorithm introduces a stable state feedback term, correcting the deficiencies of the traditional OGD algorithm, resulting in better performance in dynamic systems. - **Theoretical Analysis**: Proves that the dynamic regret upper bound of the SS-OGD algorithm is linearly related to the path length of the reference trajectory, ensuring the algorithm's effectiveness. - **Experimental Validation**: Demonstrates the superior performance of the SS-OGD algorithm under different reference trajectories through numerical experiments with a quadrotor UAV model. ### Conclusion: The paper successfully addresses the online LQT problem, especially when the reference trajectory is unknown and revealed gradually. The proposed SS-OGD algorithm not only has good theoretical performance guarantees but also performs well in practical applications. Future research directions include further optimizing the regret coefficient and dealing with reference trajectories generated by unknown dynamics.

Online Linear Quadratic Tracking with Regret Guarantees

A Learning-Based Optimal Tracking Controller for Continuous Linear Systems with Unknown Dynamics: Theory and Case Study

Predictive Linear Online Tracking for Unknown Targets

Online Control of Unknown Time-Varying Dynamical Systems

Asymptotic Tracking Controller Design for Nonlinear Systems with Guaranteed Performance.

Adaptive Gradient Online Control

Online convex optimization for robust control of constrained dynamical systems

Dynamical Models and Tracking Regret in Online Convex Programming

Online reinforcement learning control of unknown nonaffine nonlinear discrete time systems

Controlling Unknown Linear Dynamics with Almost Optimal Regret

Online Non-stochastic Control with Partial Feedback

Optimal trajectory tracking for uncertain linear discrete‐time systems using time‐varying Q‐learning

Learning Based Control Policy and Regret Analysis for Online Quadratic Optimization with Asymmetric Information Structure

Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret

Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient

Online Stackelberg Optimization via Nonlinear Control

Stronger Regret Bounds for Safe Online Reinforcement Learning in the Linear Quadratic Regulator

Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems

Data-Driven Adversarial Online Control for Unknown Linear Systems

Regret-optimal control in dynamic environments