Model-based inverse reinforcement learning for deterministic systems

Ryan Self,Moad Abudia,S.M. Nahid Mahmud,Rushikesh Kamalapurkar
DOI: https://doi.org/10.1016/j.automatica.2022.110242
IF: 6.4
2022-06-01
Automatica
Abstract:This paper focuses on the development of an online data-driven model-based inverse reinforcement learning (MBIRL) technique for linear and nonlinear deterministic systems. Input and output trajectories of an agent under observation, attempting to optimize an unknown reward function, are used to estimate the reward function and the corresponding unknown optimal value function, online and in real-time. To achieve MBIRL using limited data, a novel feedback-driven approach to MBIRL is developed. The feedback policy and the dynamic model of the agent under observation are estimated from the measured data and the estimates are used to generate synthetic data to drive MBIRL. Theoretical guarantees for ultimate boundedness of the estimation errors in general, and convergence of the estimation errors to zero in special cases, are derived using Lyapunov techniques. Proof of concept numerical experiments demonstrates the utility of the developed method to solve linear and nonlinear inverse reinforcement learning problems.
automation & control systems,engineering, electrical & electronic
What problem does this paper attempt to address?