ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

Kai-Chieh Hsu,Duy Phuong Nguyen,Jaime Fernández Fisac

2024-06-08

Abstract:The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.

Machine Learning,Robotics,Systems and Control

What problem does this paper attempt to address?

The paper aims to address the issue of safe operation of robots in uncontrolled environments, particularly how to ensure that robots can operate stably and safely under uncertain conditions when dealing with nonlinear high-dimensional dynamic systems. To solve this problem, the authors propose a new method called ISAACS (Iterative Soft Adversarial Actor-Critic for Safety). This method combines game-theoretic safety analysis with adversarial reinforcement learning, training a safety control strategy capable of handling worst-case disturbances through an iterative process in a simulated environment. Specifically, ISAACS employs a soft Actor-Critic framework, simultaneously training a fallback strategy that seeks safety and an adversarial "disturbance" agent that attempts to trigger worst-case model errors and training-to-deployment discrepancies. Although the learned control strategy itself cannot guarantee absolute safety, it can be used to construct a safety filter based on forward reachability rolling, which has robust safety assurance capabilities. This filter can work in conjunction with task-oriented control strategies that do not consider safety, to avoid behaviors that may lead to safety losses during actual deployment. In summary, the goal of this paper is to provide a scalable method for robots in nonlinear high-dimensional dynamic systems that can handle model uncertainty and has robust safety assurance. By combining game theory and reinforcement learning, ISAACS can provide effective safety strategies for complex systems while maintaining computational efficiency.

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

Robust Safe Reinforcement Learning under Adversarial Disturbances

Adaptive robust control algorithm for enhanced path-tracking performance of automated driving in critical scenarios

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning

Safe Reinforcement Learning and Adaptive Optimal Control With Applications to Obstacle Avoidance Problem

Deception Game: Closing the Safety-Learning Loop in Interactive Robot Autonomy

Pay Attention to How You Drive: Safe and Adaptive Model-Based Reinforcement Learning for Off-Road Driving

Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games

Safety-Aware Preference-Based Learning for Safety-Critical Control

Learning Adaptive Safety for Multi-Agent Systems

A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems

MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety

Safe Deep Policy Adaptation

Dynamic Simplex: Balancing Safety and Performance in Autonomous Cyber Physical Systems

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

A Learning-Based Framework for Safe Human-Robot Collaboration with Multiple Backup Control Barrier Functions

Safe Human-Interactive Control via Shielding.

Safe Reinforcement Learning with Nonlinear Dynamics via Model Predictive Shielding

Active Uncertainty Reduction for Safe and Efficient Interaction Planning: A Shielding-Aware Dual Control Approach