Non-Episodic Learning for Online LQR of Unknown Linear Gaussian System

Yiwen Lu,Yilin Mo
2021-01-01
Abstract: This paper considers the data-driven linear-quadratic regulation (LQR) problem where the system parameters are unknown and need to be identified online. In particular, the system operator is not allowed to perform multiple experiments by resetting the system to an initial state, a common approach in system identification and data-driven control literature. Instead, we propose an algorithm that gains knowledge about the system from a single trajectory, and guarantee that both the identification error and the suboptimality of control performance in this trajectory converge \emph{simultaneously} with probability one. Furthermore, we characterize the almost sure convergence rates of identification and control, and reveal an optimal trade-off between exploration and exploitation. A numerical example is provided to illustrate the effectiveness of our proposed strategy.
What problem does this paper attempt to address?