Model-free PAC Time-Optimal Control Synthesis with Reinforcement Learning

Mengyu Liu,Pengyuan Lu,Xin Chen,Oleg Sokolsky,Insup Lee,Fanxin Kong
DOI: https://doi.org/10.1109/memocode63347.2024.00009
2024-01-01
Abstract:Reaching a target safely and quickly is a control goal pursued by various applications, such as post-disaster rescue robots and industrial shipment. However, it is hard to formally guarantee safety and time-optimality under unknown dynamics via model-free controller synthesis algorithms. As a response, we propose a model-free reinforcement learning (RL) algorithm that synthesize a controller to reach a predefined target set of states with a probabilistic guarantee of time optimality, i.e., the actual reaching time is bounded close to the shortest time possible with high probability, and the bound becomes tighter when more training data is sampled. Our algorithm leverages a reward function that based on signal temporal logic (STL) robustness to reward fast reaching. With this reward function, we prove that Probably Approximately Correct (PAC) optimality in the state-value function implies PAC optimality in reach time. Then, we build our algorithm by extending Deplayed Gaussian Process Q learning (DGPQ) algorithm with a safety margin to protect the controlled agent. Consequently, our algorithm guarantees safety and a PAC bound in recovery time. Experiments show our method can achieve $\mathbf{9 7. 7 \%}$ success rate to reach the target with in the maximum time tolerance and outperform baselines.
What problem does this paper attempt to address?