Synthesis of Controllers for Co-Safe Linear Temporal Logic Specifications Using Reinforcement Learning

Xiaohua Ren,Xiang Yin,Shaoyuan Li
DOI: https://doi.org/10.23919/ccc52363.2021.9549746
2021-01-01
Abstract:Recently, the interest in controller synthesis for complex tasks is rapidly growing [1, 2], and in most cases, environments are unknown, which limits applications of traditional control methods. In this paper, we use reinforcement learning method to learn to optimally achieve complex tasks under unknown environments. Specifically, we model the uncertain environments using the Markov Decision Processes (MDPs). The high-level control objective is described by the syntactically co-safe Linear Temporal Logics (scLTLs). Under such settings, we propose a new method for the reward design procedure. The proposed new reward function utilizes the information of automata which are induced from scLTL tasks. Furthermore, we compare the proposed reward function with existing approaches in the standard grid world environments. We show that, by using our reward function, the learning process converges faster and finally optimally achieves scLTL tasks.
What problem does this paper attempt to address?