DMCL: Robot Autonomous Navigation Via Depth Image Masked Contrastive Learning

Jiahao Jiang,Ping Li,Xudong Lv,Yuxiang Yang
DOI: https://doi.org/10.1109/IROS55552.2023.10341836
2023-01-01
Abstract:Achieving high performance in deep reinforcement learning relies heavily on the ability to obtain good state representations from pixel inputs. However, learning an observation-space-to-action-space mapping from high-dimensional inputs is challenging in reinforcement learning, particularly when dealing with consecutive depth images as input states. In addition, we observe that the consecutive inputs of depth images are highly correlated for the autonomous navigation of a mobile robot, which inspires us to capture temporal correlations between consecutive inputs and infer scene change relationships. To this end, we propose a novel end-to-end robot vision navigation method dubbed DMCL, which obtains good spatial-temporal state representation via Depth image Masked Contrastive Learning. It reconstructs the latent representation from consecutive depth images masked in both spatial and temporal dimensions, resulting in a complete environment state representation. To obtain the optimal navigation policy, we leverage the Soft Actor-Critic reinforcement learning in conjunction with the above representation learning. Extensive experiments demonstrate that the proposed DMCL outperforms representative state-of-the-art methods. The source code will be made publicly available.
What problem does this paper attempt to address?