Adaptive Inverse Optimal Control for Linear Human-in-the-Loop Systems with Completely Unknown Dynamics

Mi Wang,Huai-Ning Wu
DOI: https://doi.org/10.1109/tase.2024.3487857
IF: 6.636
2024-01-01
IEEE Transactions on Automation Science and Engineering
Abstract:To improve machines’ intelligence, it is necessary for the machines to learn human’s behavior. In this paper, we make a reasonable hypothesis that a human behaves like a linear quadratic regulator whose cost function is unknown to the machine when performing a task. In addition, the system dynamics in many real applications is completely unknown. Therefore, our purpose is to search for an equivalent cost function to the human only from control input and system state data for continuous-time linear human-in-the-loop (HiTL) systems with completely unknown dynamics. An adaptive inverse optimal control (IOC) method is proposed for this purpose, which can help the machine conduct a better understanding for the human behavior and makes it possible to reproduce a similar optimal controller in other environments. Noticing the difficulty of directly obtaining the weighting matrix, an adaptive integral concurrent learning (ICL) algorithm is developed to identify the system matrices and human feedback gain matrix online, which removes the persistent excitation (PE) conditions. Then, the weighting matrix is determined via solving a convex programming problem. Finally, simulation results on the lane-keeping assist system of an intelligent vehicle are presented to demonstrate the validity of the proposed adaptive IOC algorithm. Note to Practitioners —In practice, it is hoped that the machine can work like a human such that it can replace the human to complete certain tasks. However, it is not easy to design corresponding algorithms for the machine because many tests need to be carried out for selecting appropriate parameters. Instead, an effective method is to teach the machine learn the human’s demonstrated behavior. It is noteworthy that the environment (system dynamics) may be not prior knowledge and only system state and control input are measurable. To this end, an adaptive IOC method is developed for imitation learning the human’s behavior, which is implemented online but requires only limited data. The proposed approach can be used in autonomous driving vehicle, service robot, and medical rehabilitation, etc. In future research, we will extent the proposed method to more complex environment.
What problem does this paper attempt to address?