VME-Transformer: Enhancing Visual Memory Encoding for Navigation in Interactive Environments

Jiwei Shen,Pengjie Lou,Liang Yuan,Shujing Lyu,Yue Lu
DOI: https://doi.org/10.1109/lra.2023.3333238
IF: 5.2
2023-01-01
IEEE Robotics and Automation Letters
Abstract:The efficiency of a robotic system is primarily determined by its ability to navigate complex and interactive environments. In real-world scenarios, cluttered surroundings are common, requiring a robot to navigate diverse spaces and displace objects to pave a path towards its objective. Consequently, “Visual Interactive Navigation” presents several challenges, including how to retain historical exploration information from partially observable visual signals, and how to utilize sparse rewards in reinforcement learning to simultaneously learn a latent representation and a control policy. Addressing these challenges, we introduce a Transformer-based Visual Memory Encoder (VME-Transformer), capable of embedding both recent and long-term exploration information into memory. Additionally, we explicitly estimate the robot's next pose, conditioned on the impending action, to bootstrap the learning process of the high-capacity VME-Transformer. We further regularize the value function by introducing input perturbations, thereby enhancing its generalization capabilities in previously unseen environments. In the Visual Interactive Navigation tasks within the iGibson environment, the VME-Transformer demonstrates superior performance compared to state-of-the-art methods, underlining its effectiveness.
What problem does this paper attempt to address?