Fail-Safe Adversarial Generative Imitation Learning

Philipp Geiger,Christoph-Nikolas Straehle
2023-07-28
Abstract:For flexible yet safe imitation learning (IL), we propose theory and a modular method, with a safety layer that enables a closed-form probability density/gradient of the safe generative continuous policy, end-to-end generative adversarial training, and worst-case safety guarantees. The safety layer maps all actions into a set of safe actions, and uses the change-of-variables formula plus additivity of measures for the density. The set of safe actions is inferred by first checking safety of a finite sample of actions via adversarial reachability analysis of fallback maneuvers, and then concluding on the safety of these actions' neighborhoods using, e.g., Lipschitz continuity. We provide theoretical analysis showing the robustness advantage of using the safety layer already during training (imitation error linear in the horizon) compared to only using it at test time (up to quadratic error). In an experiment on real-world driver interaction data, we empirically demonstrate tractability, safety and imitation performance of our approach.
Machine Learning,Multiagent Systems
What problem does this paper attempt to address?
The paper primarily aims to address two key issues in Imitation Learning (IL): 1. **Safety**: Ensuring that the learned policy can meet specific safety constraints during execution. This is particularly important in human-interactive or multi-agent environments, where it is crucial to ensure that the system's behavior does not lead to unsafe states or accidents. 2. **Robustness**: Enhancing the adaptability of imitation learning algorithms to situations beyond the training data distribution. For example, maintaining good performance even when the simulation time is longer than the training trajectories. To achieve these goals, the paper proposes a method called "Fail-Safe Adversarial Generative Imitation Learning" (FAGIL). The main contributions of this method can be summarized as follows: - **Design of the Safety Layer**: A simple yet flexible differentiable safety layer is proposed, which can map potentially unsafe actions to a set of known safe actions. This mapping allows for a closed-form solution of the probability density and its gradient for the entire policy. - **Inference of Safe Action Sets**: Two sample-based methods for inferring safe action sets are provided. These methods can infer the set of safe actions for a given state from a limited number of samples, leveraging the properties of Lipschitz continuity and convexity. - **Theoretical Analysis**: The paper theoretically compares the differences between using the safety layer throughout the entire training process and using it only during the testing phase. The results show that using the safety layer during the training phase can significantly reduce imitation errors, especially in terms of cumulative errors over long sequences. - **Experimental Validation**: The proposed method's effectiveness and safety are validated through experiments on real-world highway datasets, demonstrating that it can achieve performance levels close to the unconstrained baseline while maintaining safety. In summary, the paper aims to address the safety and robustness issues in imitation learning and demonstrates the effectiveness of the proposed solution through theoretical analysis and practical application cases.