ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

Kai-Chieh Hsu,Duy Phuong Nguyen,Jaime Fernández Fisac
2024-06-08
Abstract:The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.
Machine Learning,Robotics,Systems and Control
What problem does this paper attempt to address?
The paper aims to address the issue of safe operation of robots in uncontrolled environments, particularly how to ensure that robots can operate stably and safely under uncertain conditions when dealing with nonlinear high-dimensional dynamic systems. To solve this problem, the authors propose a new method called ISAACS (Iterative Soft Adversarial Actor-Critic for Safety). This method combines game-theoretic safety analysis with adversarial reinforcement learning, training a safety control strategy capable of handling worst-case disturbances through an iterative process in a simulated environment. Specifically, ISAACS employs a soft Actor-Critic framework, simultaneously training a fallback strategy that seeks safety and an adversarial "disturbance" agent that attempts to trigger worst-case model errors and training-to-deployment discrepancies. Although the learned control strategy itself cannot guarantee absolute safety, it can be used to construct a safety filter based on forward reachability rolling, which has robust safety assurance capabilities. This filter can work in conjunction with task-oriented control strategies that do not consider safety, to avoid behaviors that may lead to safety losses during actual deployment. In summary, the goal of this paper is to provide a scalable method for robots in nonlinear high-dimensional dynamic systems that can handle model uncertainty and has robust safety assurance. By combining game theory and reinforcement learning, ISAACS can provide effective safety strategies for complex systems while maintaining computational efficiency.