Inverse Reinforcement Learning for Discrete-Time Linear Quadratic Systems

Meiling Yu,Yuanhua Ni
2024-01-01
Abstract:In this article, we focus on the discrete-time stochas-tic linear quadratic problem under the presence of process and observation noise, particularly within the framework of average cost setting, exploring the optimal policy based on output feed-back mechanisms. This paper introduces a data-driven inverse reinforcement learning algorithm designed to reconstruct an unknown cost function and learn a near-optimal control policy solely based on observed optimal behavior trajectories (input-output pairs) in scenarios where the cost function is unknown. Initially, we present a model-based inverse reinforcement learning approach under the premise of known model parameters, fol-lowed by a proof of theoretical equivalence between this method and our proposed data-driven approach. This equivalence not only validates the theoretical soundness of the proposed data-driven method but also ensures the convergence of the algorithm through theoretical analysis. Ultimately, through carefully de-signed numerical simulation experiments, we demonstrate the effectiveness of the proposed algorithm, confirming its ability to successfully reconstruct the cost function and learn an effective policy based on demonstration trajectories under unknown cost function conditions.
What problem does this paper attempt to address?