Abstract:Camera relocalization is a challenging task to estimate camera pose within a known scene, with wide applications in the fields of Virtual Reality (VR), Augmented Reality (AR), robotics, and etc. Most existing learning-based methods invariably utilize all the information within an image for pose estimation. Although these methods have demonstrated leading pose accuracy in some cases, they are still far from being sufficient to handle the robustness under challenging viewpoints with less impacts on the localization accuracy for viewpoints that are easier to localize. In this paper, we propose a novel two-branch camera pose estimation framework: one branch utilizes keypoint-guided partial scene coordinate regression, while the other employs full scene coordinate regression to assess the credibility of image poses, thereby enabling more accurate camera localization. In particular, we devise a keypoint selection method predicated on matching rates which is designed to measure the matching quality between a 3D keypoint and 2D keypoints across views. With these selected 3D keypoints, we can generate 2D supervision mask with the ground-truth camera pose to supervise the keypoint prediction from the keypoint selection network. Meanwhile, we further refine the 2D supervision mask through the optimization with reprojection errors on the scene coordinate network, which estimates the scene coordinates for points within the scene that truly warrant attention, also enhances the localization performance. We also introduce a gated camera pose estimation strategy on the two-branch pose estimation framework, employing an updated keypoint selection network for images with higher credibility and a more robust network for difficult viewpoints. By adopting an effective curriculum learning scheme, we achieve higher accuracy within a training span of just 20 minutes. Our method's superior performance is validated through rigorous experimentation. The code is released at https://github.com/DUT-ICCD/KP-Guided-Reloc.

Exploring Matching Rates: from Keypoint Selection to Camera Relocalization

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Long-Term Map-Based Visual Localization: Analysis of Individual Components of a Hierarchical Pipeline

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

2-Entity Random Sample Consensus for Robust Visual Localization: Framework, Methods, and Verifications

2-Entity RANSAC for Robust Visual Localization in Changing Environment

An End-to-end Learning Framework for Visual Camera Relocalization Using RGB and RGB-D Images

Regression-Based Camera Pose Estimation through Multi-Level Local Features and Global Features

Local Optimized and Scalable Frame-to-model SLAM

Local Supports Global: Deep Camera Relocalization With Sequence Enhancement

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

SRPose: Two-view Relative Pose Estimation with Sparse Keypoints

EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization

Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization

Learning Camera Localization via Dense Scene Matching

PixSelect: Less but Reliable Pixels for Accurate and Efficient Localization

HGSLoc: 3DGS-based Heuristic Camera Pose Refinement

Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization

Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization

High-Precision Camera Localization in Scenes with Repetitive Patterns

Self-Supervised Camera Relocalization with Hierarchical Fern Encoding