Abstract:In recent years, Deep Reinforcement Learning (DRL) has emerged as an effective approach to solving real-world tasks. However, despite their successes, DRL-based policies suffer from poor reliability, which limits their deployment in safety-critical domains. Various methods have been put forth to address this issue by providing formal safety guarantees. Two main approaches include shielding and verification. While shielding ensures the safe behavior of the policy by employing an external online component (i.e., a ``shield'') that overrides potentially dangerous actions, this approach has a significant computational cost as the shield must be invoked at runtime to validate every decision. On the other hand, verification is an offline process that can identify policies that are unsafe, prior to their deployment, yet, without providing alternative actions when such a policy is deemed unsafe. In this work, we present verification-guided shielding -- a novel approach that bridges the DRL reliability gap by integrating these two methods. Our approach combines both formal and probabilistic verification tools to partition the input domain into safe and unsafe regions. In addition, we employ clustering and symbolic representation procedures that compress the unsafe regions into a compact representation. This, in turn, allows to temporarily activate the shield solely in (potentially) unsafe regions, in an efficient manner. Our novel approach allows to significantly reduce runtime overhead while still preserving formal safety guarantees. We extensively evaluate our approach on two benchmarks from the robotic navigation domain, as well as provide an in-depth analysis of its scalability and completeness.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in deep reinforcement learning (DRL), although DRL has achieved remarkable success in solving practical tasks, its policies are less reliable, which limits their applications in safety - critical areas. Specifically, the paper points out that DRL policies cannot guarantee correct performance under all possible inputs, which hinders the full integration of DRL agents in scenarios such as autonomous driving systems, robot controllers, decision support in healthcare and regulated industries, because even a single error in these areas may have serious consequences. To address this challenge, existing methods are mainly divided into two categories: shielding and verification. Shielding ensures the safe behavior of policies by using an external component (i.e., "shield") to override potentially dangerous actions at runtime, but this brings significant computational costs. Verification is an offline process that can identify unsafe policies before deployment, but when a policy is determined to be unsafe, verification methods cannot provide alternative actions. Therefore, this paper proposes a new method - verification - guided shielding, aiming to combine the advantages of these two methods, reducing runtime overhead while maintaining formal safety guarantees. Specifically, this method is implemented through the following steps: 1. **Domain Partitioning**: Use a formal verification algorithm to identify all regions in the state space where the DRL agent behaves correctly. 2. **Formal Verification of Safe Regions**: Conduct further formal verification on the initially identified safe regions to ensure absolute correctness. 3. **Clustering**: Cluster the unsafe regions to reduce the number of these regions, thereby reducing the overhead of checking whether the current input belongs to an unsafe region. 4. **Symbolic Representation**: Use symbolic representation to encode the unsafe regions and generate concise formulas, further reducing the overhead of online checks. 5. **Shield Synthesis and Execution**: Activate the shield only in potentially unsafe regions to ensure safety, while keeping the shield inactive in proven safe regions, while retaining formal safety guarantees. Through this method, the paper aims to significantly reduce the runtime overhead of traditional shielding methods while still maintaining the safety of policies. The paper has been extensively evaluated in two benchmark test environments, including Particle World and Mapless Navigation, to verify the effectiveness and scalability of its method.

Verification-Guided Shielding for Deep Reinforcement Learning

Realizable Continuous-Space Shields for Safe Reinforcement Learning

Approximate Model-Based Shielding for Safe Reinforcement Learning

Learning-Based Shielding for Safe Autonomy under Unknown Dynamics

Online Shielding for Reinforcement Learning

Human-Feedback Shield Synthesis for Perceived Safety in Deep Reinforcement Learning

Safe Reinforcement Learning via Probabilistic Shields

Automata Learning meets Shielding

Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding

Reachability Verification Based Reliability Assessment for Deep Reinforcement Learning Controlled Robotics and Autonomous Systems

Compositional Shielding and Reinforcement Learning for Multi-Agent Systems

Shielding Atari Games with Bounded Prescience

Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates

Evaluating the Safety of Deep Reinforcement Learning Models using Semi-Formal Verification

Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Safe and Reliable Training of Learning-Based Aerospace Controllers

A Dynamic Safety Shield for Safe and Efficient Reinforcement Learning of Navigation Tasks

Safe Multi-Agent Reinforcement Learning Via Dynamic Shielding

TRAINIFY: A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning

Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning