Socially compliant mobile robot navigation via inverse reinforcement learning

Henrik Kretzschmar,Markus Spies,Christoph Sprunk,Wolfram Burgard
DOI: https://doi.org/10.1177/0278364915619772
2016-07-11
The International Journal of Robotics Research
Abstract:Mobile robots are increasingly populating our human environments. To interact with humans in a socially compliant way, these robots need to understand and comply with mutually accepted rules. In this paper, we present a novel approach to model the cooperative navigation behavior of humans. We model their behavior in terms of a mixture distribution that captures both the discrete navigation decisions, such as going left or going right, as well as the natural variance of human trajectories. Our approach learns the model parameters of this distribution that match, in expectation, the observed behavior in terms of user-defined features. To compute the feature expectations over the resulting high-dimensional continuous distributions, we use Hamiltonian Markov chain Monte Carlo sampling. Furthermore, we rely on a Voronoi graph of the environment to efficiently explore the space of trajectories from the robot’s current position to its target position. Using the proposed model, our method is able to imitate the behavior of pedestrians or, alternatively, to replicate a specific behavior that was taught by tele-operation in the target environment of the robot. We implemented our approach on a real mobile robot and demonstrated that it is able to successfully navigate in an office environment in the presence of humans. An extensive set of experiments suggests that our technique outperforms state-of-the-art methods to model the behavior of pedestrians, which also makes it applicable to fields such as behavioral science or computer graphics.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to enable mobile robots to navigate in a human - environment in a manner that complies with social norms. Specifically, the author proposes a new method to model human cooperative navigation behaviors, so that the robot can understand and abide by commonly - accepted social rules, and thus will not disturb the surrounding humans when interacting with them. This method is achieved by learning from human observational data, including continuous and discrete decisions made by humans during the navigation process, as well as the actual trajectories resulting from these decisions. In this way, the robot can predict the behaviors of multiple agents in new situations and react accordingly, thereby achieving a more natural and socialized navigation method. ### Main contributions of the paper 1. **Proposed a probabilistic framework**: It is used to learn the behaviors of interacting agents (such as pedestrians) from demonstrations. The key challenge of this framework is the so - called forward problem, that is, for a given model, calculating the expected eigenvalues regarding the high - dimensional continuous trajectory space distribution. To this end, the author proposes to use the Markov Chain Monte Carlo (MCMC) sampling method and takes advantage of the highly - structured characteristics of the interacting agent - observed trajectory distribution. 2. **Efficiently estimate feature expectations**: Use the Hybrid Monte Carlo (HMC) algorithm to effectively estimate feature expectations, which is applicable to any differentiable features. This enables the model to capture the randomness of the observed trajectories, while existing methods often learn deterministic models and cannot well replicate the random behaviors of natural agents. 3. **Path exploration techniques in the environment**: Proposed a method to efficiently explore the trajectory space from the robot's current position to the target position using the Voronoi diagram of the environment. 4. **Experimental verification**: The effectiveness of the proposed method has been verified through a series of experiments, including the Turing test. The results show that the trajectories generated by this method are considered more human - like than those generated by other methods. In addition, the performance of this method in actual robot navigation tasks has also been evaluated. ### Method overview - **Maximum entropy inverse reinforcement learning**: Based on the maximum entropy principle, learn the probability distribution by matching feature expectations. This method assumes that the observed trajectories are samples drawn from a certain probability distribution, and this probability distribution depends on certain features. - **Modeling continuous navigation decisions**: Use spline functions to represent the trajectories of agents, thereby converting the infinite - dimensional continuous trajectory space into a finite - dimensional space. By defining appropriate features (such as time, acceleration, etc.), capture the physical properties of navigation behaviors. - **Feature expectation calculation**: Use the HMC algorithm to calculate feature expectations, thereby finding the optimal parameters that match the model parameters with the observational data. ### Conclusion This paper solves the problem of mobile robots performing socially - compliant navigation in human environments by proposing a new probabilistic framework. By learning from human navigation behaviors, robots can better understand and predict the behaviors of pedestrians, thereby achieving more natural and efficient navigation.