Inverse Value Iteration and Q -Learning: Algorithms, Stability, and Robustness

Bosen Lian,Wenqian Xue,Frank L Lewis,Ali Davoudi
DOI: https://doi.org/10.1109/TNNLS.2024.3409182
2024-06-18
Abstract:This article proposes a data-driven model-free inverse Q -learning algorithm for continuous-time linear quadratic regulators (LQRs). Using an agent's trajectories of states and optimal control inputs, the algorithm reconstructs its cost function that captures the same trajectories. This article first poses a model-based inverse value iteration scheme using the agent's system dynamics. Then, an online model-free inverse Q -learning algorithm is developed to recover the agent's cost function only using the demonstrated trajectories. It is more efficient than the existing inverse reinforcement learning (RL) algorithms as it avoids the repetitive RL in inner loops. The proposed algorithms do not need initial stabilizing control policies and solve for unbiased solutions. The proposed algorithm's asymptotic stability, convergence, and robustness are guaranteed. Theoretical analysis and simulation examples show the effectiveness and advantages of the proposed algorithms.
What problem does this paper attempt to address?