Abstract:This study addresses the challenge of visual localization using monocular images, a crucial technology for autonomous systems that facilitates their navigation and interaction capabilities. With the advent of deep learning, visual localization techniques that utilize these methods have demonstrated improved robustness across diverse environments. Existing end-to-end models apply convolutional neural networks (CNNs) to extract salient features and directly estimate continuous spatial poses from map models that allow for implicit differentiation. Nonetheless, these models often falter in adapting their feature representations to extreme variations in environmental conditions, leading to critical localization inaccuracies during episodes of altered lighting, varying weather, or in the presence of moving objects. To overcome these limitations, we introduce the end-to-end feature refinement network for visual localization (EFRNet-VL). This network architecture is specifically designed to prioritize the extraction of static features crucial for the six degrees of freedom (6DoF) pose estimation, thereby outperforming prior methodologies. EFRNet-VL meticulously integrates a convolutional network structure with self-attention mechanisms and Long Short-Term Memory (LSTM) modules, which together facilitate the accurate association of a single image with its corresponding camera pose, even within dynamic environments. The proposed feature refinement approach is straightforward to implement and can enhance the performance of existing neural pose estimators. Our comprehensive evaluations of EFRNet-VL underscore its effectiveness. Notably, it has diminished the average position and orientation errors by 54.5% and 25.7%, respectively, as compared to the popular PoseNet model across various indoor settings. Moreover, in large-scale outdoor environments, it has achieved an average localization precision of 7.02m/2.79°. EFRNet-VL has set a new benchmark for end-to-end learning-based methods in visual localization and operates efficiently in real time, processing at a speed of 9.8 ms per image frame.

Prior Guided Dropout for Robust Visual Localization in Dynamic Environments.

Communication Constrained Cloud-Based Long-Term Visual Localization in Real Time.

Deterministic Optimality for Robust Vehicle Localization Using Visual Measurements

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Long-Term Map-Based Visual Localization: Analysis of Individual Components of a Hierarchical Pipeline

2-Entity RANSAC for Robust Visual Localization in Changing Environment

2-Entity Random Sample Consensus for Robust Visual Localization: Framework, Methods, and Verifications

Feature Regions Segmentation Based RGB-D Visual Odometry in Dynamic Environment

Robust Monocular SLAM in Dynamic Environments

Visual Localization in a Prior 3D LiDAR Map Combining Points and Lines

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

Robust Rgb-D Slam In Dynamic Environment Using Faster R-Cnn

EnforceNet: Monocular Camera Localization in Large Scale Indoor Sparse LiDAR Point Cloud

Robust Stereo Visual SLAM for Dynamic Environments With Moving Object

Pose Refinement: Bridging the Gap Between Unsupervised Learning and Geometric Methods for Visual Odometry.

Robust Visual Localization with Dynamic Uncertainty Management in Omnidirectional SLAM

EFRNet-VL: An end-to-end feature refinement network for monocular visual localization in dynamic environments

Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments

Real-time Visual SLAM based YOLO-Fastest for Dynamic Scenes

Robust self-supervised monocular visual odometry based on prediction-update pose estimation network.

DF-VO: What Should Be Learnt for Visual Odometry?