End-to-end Visual Navigation with Intrinsic Motivation in 3D Maze-like Environments

Peng Li,Xiao-Gang Ruan,Xiao-Qing Zhu
DOI: https://doi.org/10.1109/cac53003.2021.9728285
2021-01-01
Abstract:Learning to navigate in visually rich environment with sparse reward is a long-standing problem in developing AI agent. In this paper, inspired by spontaneous exploration behaviour in animals, we proposed an end-to-end visual navigation method which use intrinsic reward to motivate explore-exploit strategy. Firstly, in order to directly obtain control policy from pixel-level input and implement end-to-end training, we take deep reinforcement learning as basic navigation framework. Secondly, we allow the agent to create reward for itself, and the intrinsic reward is related to episode memory and include two parts: (1) the frequency of agent reach a state is recorded, and these counts are then used to compute bonus according to the classic count-based method; (2) we also take temporal distance as a basis for distributing bonus, and the magnitude of it is determined by environment steps between current observation and those in memory. Finally, such intrinsic reward is summed up with the real task reward, thus making agent is able to use the combined reward to learn control policy. We test our approach in 3D maze-like environment in DMlab, and validate its navigation performance through different reward scenarios. The experiment results show our approach can get effective goal-driven behaviour from raw sensory input and outperform the baselines across all tasks. Furthermore, the agent equipped with our intrinsic motivation appear to explore faster in no reward environment.
What problem does this paper attempt to address?