Abstract:We propose an online learning algorithm that adaptively designs a decentralized linear quadratic regulator when the system model is unknown a priori and new data samples from a single system trajectory become progressively available. The algorithm uses a disturbance-feedback representation of state-feedback controllers coupled with online convex optimization with memory and delayed feedback. Under the assumption that the system is stable or given a known stabilizing controller, we show that our controller enjoys an expected regret that scales as $\sqrt{T}$ with the time horizon $T$ for the case of partially nested information pattern. For more general information patterns, the optimal controller is unknown even if the system model is known. In this case, the regret of our controller is shown with respect to a linear sub-optimal controller. We validate our theoretical findings using numerical experiments.

What problem does this paper attempt to address?

The paper attempts to address the problem of designing an online learning algorithm to adaptively generate Decentralized Linear Quadratic Regulators (DLQR) when the system model is unknown. Specifically, the paper focuses on how to design control strategies to achieve effective system regulation in a networked system where the controllers of each subsystem can only access partial global state information. Traditional decentralized control research usually assumes a known system model, whereas this paper utilizes data samples from a single system trajectory for online learning under the premise of an unknown system model. The main contributions of the paper include: 1. **Optimal Solution under Partially Nested Information Pattern**: Assuming the information pattern is partially nested, a controller structure based on Disturbance-Feedback Controller (DFC) is proposed, and it is proven that this controller can achieve an expected regret of the order $ \sqrt{T} $, where $ T $ is the length of the time horizon. 2. **Online Convex Optimization Algorithm**: An online convex optimization (OCO) algorithm with memory and delayed feedback is designed to adaptively design control strategies after estimating the system model. 3. **Theoretical Guarantees**: Under the assumption of system stability or given a known stable controller, it is proven that the expected regret of the proposed controller is the same as that of the centralized LQR controller, achieving the best regret guarantee up to logarithmic factors. 4. **Extension to General Information Patterns**: For more general information patterns, even if the system model is known, the optimal controller is unknown. The paper extends the theoretical results by comparing the regret with a suboptimal linear controller. 5. **Numerical Experiment Validation**: The effectiveness of the theoretical analysis is validated through numerical experiments. In summary, the paper aims to solve the problem of designing effective decentralized control strategies in networked systems when the system model is unknown and the controllers of each subsystem can only access partial global state information.

Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret

Learning Decentralized Linear Quadratic Regulators with [math] Regret

Decentralized Adaptive Iterative Learning Control for Interconnected Systems with Uncertainties

Online Actuator Selection and Controller Design for Linear Quadratic Regulation with Unknown System Model

Almost Surely $\sqrt{T}$ Regret Bound for Adaptive LQR

Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

Stronger Regret Bounds for Safe Online Reinforcement Learning in the Linear Quadratic Regulator

Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

Fully Adaptive Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems

Learning to Control under Time-Varying Environment

Episodic Linear Quadratic Regulators with Low-rank Transitions

On the Sample Complexity of Decentralized Linear Quadratic Regulator with Partially Nested Information Structure

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification

Regret Analysis of Online LQR Control via Trajectory Prediction and Tracking: Extended Version

Online Linear Quadratic Tracking with Regret Guarantees

Online Non-stochastic Control with Partial Feedback

Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems

On Adaptive Linear-Quadratic Regulators

Learning to Control under Uncertainty with Data-Based Iterative Linear Quadratic Regulator

The Fundamental Limitations of Learning Linear-Quadratic Regulators