Abstract:Reinforcement Learning (RL) has demonstrated remarkable success across various domains. Nonetheless, a significant challenge in RL is to ensure safety, particularly when deploying it in safety-critical applications such as robotics and autonomous driving. In this work, we develop a robust and safe RL methodology grounded in manifold space. Initially, we construct a constrained manifold space, taking safety constraints into consideration. We then propose a robust safe RL approach, supported by theoretical analysis, based on the value at risk and conditional value at risk, in order to enhance the robustness of safety. Our methodology is designed to ensure safety within stochastic constraint environments. Following the theoretical analysis, we develop a practical, safe algorithm to search for a robust safe policy on stochastic constraint manifolds (ROSCOM). We evaluate the effectiveness of our approach through circular motion and air-hockey tasks. Our experiments demonstrate that ROSCOM outperforms existing baselines in terms of both reward and safety. Note to Practitioners-Real-world applications often involve inherent uncertainties, noise, and high-dimensional spaces. This complexity accentuates the urgency and challenge of ensuring safety in robot learning, especially when implementing RL in practical environments. To address this critical issue, we build a stochastic constraint manifold to delineate the safety space, thus establishing a rigorous framework for robot learning at each iteration. Compared with state-of-the-art baselines, our method can provide remarkable performance regarding safety and reward performance. For example, in an air hockey robot learning task, our method has demonstrated a remarkable $50\%$ enhancement in safety performance compared to the ATACOM framework, while concurrently exhibiting superior reward performance. Moreover, in contrast to traditional algorithms, including CPO, PCPO, our method has achieved a 99% improvement in safety performance, coupled with significantly superior reward performance. These empirical insights render our approach not only theoretically sound but also practically efficacious, indicating its potential as a useful tool in real robot learning and beyond.

Safe Reinforcement Learning via Probabilistic Timed Computation Tree Logic

Safe Reinforcement Learning for CPSs via Formal Modeling and Verification

Dependable Reinforcement Learning Via Timed Differential Dynamic Logic.

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Safe Reinforcement Learning via Hierarchical Adaptive Chance-Constraint Safeguards

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Safe Reinforcement Learning for Signal Temporal Logic Tasks Using Robust Control Barrier Functions

ROSCOM: Robust Safe Reinforcement Learning on Stochastic Constraint Manifolds

Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces

Formal Control Synthesis Via Safe Reinforcement Learning under Real-Time Specifications

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Adaptive Safe Reinforcement Learning with Full-State Constraints and Constrained Adaptation for Autonomous Vehicles

Synthesis of Controllers for Co-Safe Linear Temporal Logic Specifications Using Reinforcement Learning

Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving

Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

Signal Temporal Logic Neural Predictive Control