Abstract:Learning reliably safe autonomous control is one of the core problems in trustworthy autonomy. However, training a controller that can be formally verified to be safe remains a major challenge. We introduce a novel approach for learning verified safe control policies in nonlinear neural dynamical systems while maximizing overall performance. Our approach aims to achieve safety in the sense of finite-horizon reachability proofs, and is comprised of three key parts. The first is a novel curriculum learning scheme that iteratively increases the verified safe horizon. The second leverages the iterative nature of gradient-based learning to leverage incremental verification, reusing information from prior verification runs. Finally, we learn multiple verified initial-state-dependent controllers, an idea that is especially valuable for more complex domains where learning a single universal verified safe controller is extremely challenging. Our experiments on five safe control problems demonstrate that our trained controllers can achieve verified safety over horizons that are as much as an order of magnitude longer than state-of-the-art baselines, while maintaining high reward, as well as a perfect safety record over entire episodes. Our code is available at <a class="link-external link-https" href="https://github.com/jlwu002/VSRL" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to train controllers that can be formally verified as safe in non - linear neural dynamic systems while maintaining high performance. Specifically, the paper focuses on how to achieve the learning of safe control strategies within a limited time range (in the sense of finite - time reachability proof) and maximize the overall performance. This problem is a core challenge in autonomous systems, especially in scenarios where reliability needs to be guaranteed, such as the navigation and obstacle avoidance of self - driving cars or drones. Current methods usually can only provide empirical safety assessments and cannot provide strict deterministic safety guarantees in practical applications. Therefore, the paper proposes a new method, aiming to overcome the limitations of existing technologies and achieve the ability to verify safety over a longer time range while maintaining high efficiency and safety. The main contributions of the paper include: 1. Proposing a safety - optimal control framework that combines finite - time verification (safety constraints in the worst - case) and empirical (safety constraints in the average - case). 2. Introducing a novel curriculum - learning scheme that uses memory, forward - reachability analysis, and differentiable reachability over - estimation to efficiently learn verified safe strategies. 3. Proposing a method for learning a set of control strategies that depend on the initial state, thereby significantly increasing the verified safety range over a large - scale initial - state set. 4. Developing an incremental verification method that uses small changes in gradient - based learning to improve the verification efficiency during the learning process. 5. Conducting extensive experimental evaluations, demonstrating the effectiveness of the proposed method on five of the latest safe - reinforcement - learning baselines. Through these contributions, the paper provides a new perspective and technical means for solving the problem of verified safety of controllers in non - linear dynamic systems.

Verified Safe Reinforcement Learning for Neural Network Dynamic Models

Scalable Synthesis of Verified Controllers in Deep Reinforcement Learning

Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning

Model-free Neural Lyapunov Control for Safe Robot Navigation

Safe and Reliable Training of Learning-Based Aerospace Controllers

Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies

Reachability Verification Based Reliability Assessment for Deep Reinforcement Learning Controlled Robotics and Autonomous Systems

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Provably Safe Neural Network Controllers via Differential Dynamic Logic

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems

Runtime Safety Assurance Using Reinforcement Learning

Model-Free Safe Reinforcement Learning Through Neural Barrier Certificate

Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates

Safe Reinforcement Learning via a Model-Free Safety Certifier

Safe Reinforcement Learning with Probabilistic Guarantees Satisfying Temporal Logic Specifications in Continuous Action Spaces

Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning