Learn to Triangulate Scene Coordinates for Visual Localization

Xiang Guo,Tianrui Chen,Bo Li,Qi Liu,Huarong Jia,Yuchao Dai
DOI: https://doi.org/10.1109/lra.2024.3362637
IF: 5.2
2024-01-01
IEEE Robotics and Automation Letters
Abstract:Visual localization plays a critical role in robotics. The scene coordinate regression-based localization methods have achieved state-of-the-art performance. However, current methods still have a gap between the regressed scene coordinates and the ground truth scene coordinates, which hinders the improvement of localization accuracy. These methods generally use structure-from-motion (SfM) or depth sensors to generate proxy scene coordinate supervision labels for training, but these proxy labels are contaminated with errors and noises, which are sub-optimal for training. To resolve this issue, we introduce a simple yet effective triangulation constraint, which could be easily incorporated into any scene coordinate regression-based framework. Instead of directly regressing the scene coordinates, our constraint reinforces the network, which learns to triangulate the ground truth scene coordinates without any proxy scene coordinate labels for supervision. Extensive experiments across multiple public datasets show that our triangulation constraint establishes significant improvement and even achieves better results without proxy labels for supervision. Furthermore, our method could recover denser and more complete 3D models compared with the SfM and other localization methods.
What problem does this paper attempt to address?