Abstract:This paper proposes an advanced Reinforcement Learning (RL) method, incorporating reward-shaping, safety value functions, and a quantum action selection algorithm. The method is model-free and can synthesize a finite policy that maximizes the probability of satisfying a complex task. Although RL is a promising approach, it suffers from unsafe traps and sparse rewards and becomes impractical when applied to real-world problems. To improve safety during training, we introduce a concept of safety values, which results in a model-based adaptive scenario due to online updates of transition probabilities. On the other hand, a high-level complex task is usually formulated via formal languages, including Linear Temporal Logic (LTL). Another novelty of this work is using an Embedded Limit-Deterministic Generalized Büchi Automaton (E-LDGBA) to represent an LTL formula. The obtained deterministic policy can generalize the tasks over infinite and finite horizons. We design an automaton-based reward, and the theoretical analysis shows that an agent can accomplish task specifications with the maximum probability by following the optimal policy. Furthermore, a reward shaping process is developed to avoid sparse rewards and enforce the RL convergence while keeping the optimal policies invariant. In addition, inspired by quantum computing, we propose a quantum action selection algorithm to replace the existing -greedy algorithm for the balance of exploration and exploitation during learning. Simulations demonstrate how the proposed framework can achieve good performance by dramatically reducing the times to visit unsafe states while converging optimal policies.

Security-Aware Reinforcement Learning under Linear Temporal Logic Specifications.

Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision Processes

Safe Reinforcement Learning for Signal Temporal Logic Tasks Using Robust Control Barrier Functions

Secure-by-Construction Controller Synthesis for Stochastic Systems under Linear Temporal Logic Specifications

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Synthesis of Controllers for Co-Safe Linear Temporal Logic Specifications Using Reinforcement Learning

Certified Reinforcement Learning with Logic Guidance

Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions

Regret-Free Reinforcement Learning for LTL Specifications

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

Deep Reinforcement Learning with Temporal Logics

Directed Exploration in Reinforcement Learning from Linear Temporal Logic

Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

Safe reinforcement learning under temporal logic with reward design and quantum action selection

Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

Robust Satisfaction of Temporal Logic Specifications via Reinforcement Learning

Secure-by-Construction Optimal Path Planning for Linear Temporal Logic Tasks

Formal Control Synthesis Via Safe Reinforcement Learning under Real-Time Specifications

Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces