Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation

Enrico Marchesini,Davide Corsi,Alessandro Farinelli
DOI: https://doi.org/10.48550/arXiv.2112.10593
2021-12-17
Abstract:We propose a novel benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation. Aquatic navigation is an extremely challenging task due to the non-stationary environment and the uncertainties of the robotic platform, hence it is crucial to consider the safety aspect of the problem, by analyzing the behavior of the trained network to avoid dangerous situations (e.g., collisions). To this end, we consider a value-based and policy-gradient Deep Reinforcement Learning (DRL) and we propose a crossover-based strategy that combines gradient-based and gradient-free DRL to improve sample-efficiency. Moreover, we propose a verification strategy based on interval analysis that checks the behavior of the trained models over a set of desired properties. Our results show that the crossover-based training outperforms prior DRL approaches, while our verification allows us to quantify the number of configurations that violate the behaviors that are described by the properties. Crucially, this will serve as a benchmark for future research in this domain of applications.
Machine Learning,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to provide a safe deep reinforcement learning (DRL) benchmark environment for underwater navigation. Specifically, the researchers are concerned with how to ensure the safe navigation of autonomous underwater vehicles (such as drones) in non - stationary and uncertain underwater environments. This includes avoiding dangerous situations such as collisions and ensuring that the trained models can perform robustly in practical applications. ### Problems Mainly Solved in the Paper 1. **Safe Navigation in Non - stationary Environments**: - The underwater environment is highly dynamic and uncertain. For example, the presence of waves makes it difficult for traditional geometric or model - based techniques to fully cope with this complexity. Therefore, the researchers proposed a method that combines gradient - based and gradient - free deep reinforcement learning to improve sample efficiency and performance, so as to better adapt to this challenging environment. 2. **Model Behavior Verification**: - To ensure that the trained policy does not lead to the occurrence of dangerous situations, the researchers introduced a verification method based on interval analysis to check whether the trained model conforms to the expected behavior characteristics. This method can quantify the number of configurations that violate these characteristics, thus providing an important benchmark for future research. 3. **Comprehensive Evaluation Framework**: - A brand - new underwater drone simulator was proposed, which can simulate real - life water surface waves and other physical phenomena. Through this platform, researchers not only tested different types of deep reinforcement learning algorithms (such as value - based and policy - gradient), but also developed a set of formal verification tools to evaluate the safety and reliability of these algorithms. ### Key Contributions - **Cross - training Improvement**: By introducing cross - operators and combining gradient - based and gradient - free methods, sample efficiency and performance are improved, especially in complex underwater environments. - **Formal Verification Extension**: The existing interval analysis tools are extended to achieve parallel evaluation of the core behavioral properties of underwater navigation and calculate the proportion of violations of these properties. - **Benchmark Environment Construction**: A new, physically realistic underwater navigation environment is created as an important benchmark for future research. In conclusion, this paper aims to promote the development of deep reinforcement learning in practical application scenarios, especially in areas requiring high reliability and safety, by constructing a safe and reliable underwater navigation benchmark environment.