Safe Reinforcement Learning via Probabilistic Timed Computation Tree Logic

Li Qian,Jing Liu
DOI: https://doi.org/10.1109/ijcnn48605.2020.9207384
2020-01-01
Abstract:Reinforcement learning aims to discover an optimal policy that maximizes reward based on the feedback signal. Although the method succeeds in numerous systems, it may not apply to safe-critical systems due to the absence of safety protection mechanism. Besides, the agent is unable to model the environment accurately if getting biased observation. We present a safe algorithm called Safe Control with Supervisor (SCS) for addressing the limitation. If the model is accurate, the supervisor monitors the system and repairs the action of the agent at runtime, which guides the system to obey the specification described by probabilistic timed Computation Tree Logic (ptCTL). If not, the supervisor would maximize the probability of satisfying a given task specification. We validate our method through experiments of adaptive cruise control under uncertainty.
What problem does this paper attempt to address?