Abstract:Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints.

Deep Reinforcement Learning Under Signal Temporal Logic Constraints Using Lagrangian Relaxation

Deep Reinforcement Learning Based Networked Control with Network Delays for Signal Temporal Logic Specifications

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Convergence Rate of Primal-Dual Approach to Constrained Reinforcement Learning with Softmax Policy

Deep Reinforcement Learning with Temporal Logics

Tractable Reinforcement Learning of Signal Temporal Logic Objectives

Model-based Reinforcement Learning from Signal Temporal Logic Specifications

Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

Signal Temporal Logic Neural Predictive Control

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

Constrained Reinforcement Learning for Predictive Control in Real-Time Stochastic Dynamic Optimal Power Flow

Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision Processes

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning

Funnel-based Reward Shaping for Signal Temporal Logic Tasks in Reinforcement Learning

Reinforcement learning under temporal logic constraints as a sequence modelling problem

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Reinforcement Learning with Temporal-Logic-Based Causal Diagrams

Certified Reinforcement Learning with Logic Guidance

A Dual-Layer Network Deep Reinforcement Learning Algorithm for Multi-objective Signal Temporal Logic Tasks

Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization