Off-Policy Q-Learning for Infinite Horizon LQR Problem with Unknown Dynamics

Xinxing Li,Zhihong Peng,Li Liang
DOI: https://doi.org/10.1109/ISIE.2018.8433684
2018-01-01
Abstract:In this paper, a novel online Q-Iearning approach is proposed to solve the Infinite Horizon Linear Quadratic Regulator (IHLQR) problem for continuous-time (CT) linear time-invariant (LMI) systems. The proposed Q-Iearning algorithm employing off-policy reinforcement learning (RL) technology improves the exploration ability of Q-Iearning to the state space. During the learning process, the Q-Iearning algorithm can be implemented just using the data sets which just contains the information of the behavior policy and the corresponding system state, thus is data- driven. Moreover, the data sets can be used repeatedly, which is computationally efficient. A mild condition on probing noise is established to ensure the converge of the proposed Q-Iearning algorithm. Simulation results demonstrate the effectiveness of the developed algorithm.
What problem does this paper attempt to address?