Visual Representations for Semantic Target Driven Navigation

Arsalan Mousavian,Alexander Toshev,Ayzaan Wahid,James Davidson,Marek Fiser,Jana Kosecka
DOI: https://doi.org/10.1109/icra.2019.8793493
2019-05-01
Abstract:What is a good visual representation for navigation? We study this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a previously unseen environment to a target object, e.g. go to the refrigerator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues. We propose to use semantic segmentation and detection masks as observations obtained by state-of-the-art computer vision algorithms and use a deep network to learn the navigation policy. The availability of equitable representations in simulated environments enables joint training using real and simulated data and alleviates the need for domain adaptation or domain randomization commonly used to tackle the sim-to-real transfer of the learned policies. Both the representation and the navigation policy can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset [1]. Our approach successfully gets to the target in 54% of the cases in unexplored environments, compared to 46% for a non-learning based approach, and 28% for a learning-based baseline.
What problem does this paper attempt to address?