Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning

Daniel Flögel,Marcos Gómez Villafañe,Joshua Ransiek,Sören Hohmann
2024-09-17
Abstract:Autonomous mobile robots are increasingly employed in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL-based navigation framework for uncertainty estimates in decision-making. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of Deep Ensembles and Monte-Carlo Dropout (MC-Dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show that the ODV-PPO algorithm converges faster with better generalization and disentangles the aleatoric and epistemic uncertainties. In addition, the MC-Dropout approach is more sensitive to perturbations and capable to correlate the uncertainty type to the perturbation type better. With the proposed safe action selection scheme, the robot can navigate in perturbed environments with fewer collisions.
Robotics,Artificial Intelligence,Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of safe navigation for autonomous mobile robots in pedestrian - dense environments. Specifically, although deep reinforcement learning (DRL) can enable robots to exhibit socially integrated behaviors, in the face of new or disturbed scenarios, the uncertainty in the robot's decision - making process may lead to collisions or human discomfort. This unknown uncertainty is an open problem in achieving safe and risk - aware navigation. Therefore, this paper proposes a new method that integrates **aleatoric, epistemic and predictive uncertainty estimates** into a DRL - based navigation framework to improve the ability to estimate uncertainty in the decision - making process. ### Core Objectives of the Paper 1. **Integrate Uncertainty Estimation**: By introducing observation - dependent variance (ODV) and Dropout techniques into the proximal policy optimization (PPO) algorithm, more accurate estimation of uncertainty in the decision - making process is achieved. 2. **Compare Uncertainty Estimation Methods**: Compare the uncertainty estimation capabilities of deep ensembles and Monte Carlo Dropout (MC - Dropout) under different types of perturbations. 3. **Adjust Robot Behavior**: In decision - making situations with high uncertainty, it is recommended that the robot adopt a conservative collision - avoidance strategy to reduce collisions and negative impacts on humans. 4. **Verify the Effectiveness of the Method**: Demonstrate the effectiveness of the proposed architecture in handling disturbed environments through simulation experiments. ### Main Contributions 1. **Enhance the PPO Algorithm**: Extend the PPO algorithm to handle observation - dependent variance while maintaining stable training and a good balance between exploration and exploitation. 2. **Compare Uncertainty Estimation Methods**: Evaluate the applicability of MC - Dropout and deep ensembles in separating aleatoric and epistemic uncertainties. 3. **Improve Safety**: Propose changing the robot's behavior in uncertain human - machine interaction situations to reduce collisions and negative impacts. 4. **Simulation Verification**: Verify the effectiveness of the proposed method through simulation experiments. ### Background and Motivation - **Challenges in Pedestrian - Dense Environments**: The main challenge for autonomous mobile robots in pedestrian - dense environments is to handle the randomness of human behavior and unseen scenarios. - **Limitations of Existing Methods**: Most DRL - based navigation methods do not consider out - of - distribution (OOD) or noise, which leads to poor performance in these scenarios. - **Importance of Uncertainty**: Distinguishing and considering uncertainty in the decision - making process is crucial for the successful deployment of autonomous robots in the real world. ### Method Overview 1. **Observation - Dependent Variance (ODV)**: By introducing a linear layer in the PPO algorithm to output the observation - dependent variance, both the mean and variance of the policy depend on the observation. 2. **Uncertainty Estimation**: - **MC - Dropout**: Estimate epistemic and aleatoric uncertainties by introducing Dropout in the network to approximate Bayesian inference. - **Deep Ensembles**: Estimate uncertainty by training multiple networks with different initial weights. 3. **Probabilistic Collision (POC) Estimation**: Based on uncertainty measurements, propose a risk - aware action - selection scheme to avoid collisions in high - uncertainty situations. ### Experimental Results - **Training and Generalization**: The ODV - PPO method converges faster during the training process and performs better in handling perturbations and OOD environments. - **Uncertainty Separation**: The MC - Dropout method can effectively separate sources of uncertainty, while the deep ensembles method always shows high epistemic uncertainty and cannot distinguish between aleatoric and epistemic uncertainties. - **Safety Improvement**: By adopting a conservative collision - avoidance strategy in high - uncertainty situations, the number of collisions is significantly reduced. In conclusion, this paper aims to improve the safe navigation ability of autonomous mobile robots in pedestrian - dense environments by introducing and comparing different uncertainty estimation methods.