PI-ELM: Reinforcement learning-based adaptable policy improvement for dynamical system

Yingbai Hu,Xu Wang,Yueyue Liu,Weiping Ding,Alois Knoll
DOI: https://doi.org/10.1016/j.ins.2023.119700
IF: 8.1
2023-09-22
Information Sciences
Abstract:Behavioral cloning of imitation learning is theoretically sound that can capture and generate the motor skills from expert demonstrations, but they suffer poor adaptability with a small dataset in a new environment. This study improves adaptability, by proposing a novel reinforcement learning strategy for low-level behavioral learning with a small number of expert demonstrations. Specifically, the policy improvement-based reinforcement learning framework is divided into two phases: the low level is based on supervised learning using extreme learning machine (ELM) to clone the behavior from demonstrations, which can further be represented as a dynamical system with policy parameters; and the high level reinforcement learning improves the adaptability of ELM in new tasks. In this paper, we bridge the gap between machine learning and stochastic optimal control systems and propose the improved path integral-based reinforcement learning PI-ELM strategy to learn the policy parameters from low-level ELM . The proposed framework's performance and effectiveness are illustrated through several task experiments. The results indicate that our method can significantly improve the adaptability of imitation learning in new scenarios, including single task obstacle avoidance, via-points, antidisturbance, or hybrid tasks.
computer science, information systems
What problem does this paper attempt to address?