An offender–defender safety game
Miroslav Krstic
DOI: https://doi.org/10.1016/j.arcontrol.2024.100939
IF: 9.4
2024-03-14
Annual Reviews in Control
Abstract:In this tutorial we study a safety analog of the classical zero-sum differential game with positive definite penalties on the state and the two inputs. Consider a nonlinear system affine in two inputs, which are called "offender" and "defender." Let the inputs have the opposing objectives in relation to an infinite-time cost which, in addition to penalizing the inputs of both agents, incorporates a safety index of the system (a barrier function), with the defender aiming to maximize the system safety and the offender aiming to minimize it. If there is a pair of (offender, defender) non-Nash feedback policies of the Lgh form with a safe outcome, namely, where the defender maintains safety while the offender fails to violate safety, then there exists an inverse optimal pair of policies that attain a Nash equilibrium relative to the safety minimax objective. In the tutorial we study both deterministic and stochastic offenders. The deterministic offender applies its feedback through its deterministic input value, while the stochastic offender applies its feedback through its incremental covariance. In addition to Nash policies for a minimax offender–defender formulation, we provide feedback laws for the defender, in the scenario where the offender action is unrestricted by optimality, and where the defender ensures input-to-state safety in the deterministic and stochastic senses. This tutorial is derived from our recent article on inverse optimal safety filters, by setting the nominal control to zero and declaring the disturbance to be the offender agent. Among several illustrative examples, one is particularly interesting and unconventional. We consider a safety game played on a unicycle vehicle between its two inputs: the angular velocity and the linear velocity, as the opposing players. We consider two scenarios. In the first, the angular velocity, acting as an offender, attempts to run the vehicle into an obstacle by steering, while the linear velocity, acting as a defender, drives the vehicle forward or in reverse to prevent the vehicle being run into the obstacle. In the second scenario, the linear velocity acts as an offender and angular velocity acts as a defender (in the deterministic case by varying the heading rate; in the stochastic case by varying the variance of a white noise driving the heading rate). A "wind" towards the obstacle advantages the offender in both scenarios. The input policies derived are optimal in the sense of their opposite objectives, under the best possible policy of the opponent, under meaningful costs on their actions. The linear velocity input prevails, whether acting in the role of a defender, in which case the collision with the obstacle is prevented, or in the role of an offender, in which case the collision with the obstacle is achieved.
automation & control systems