Abstract:Camera relocalization is a challenging task to estimate camera pose within a known scene, with wide applications in the fields of Virtual Reality (VR), Augmented Reality (AR), robotics, and etc. Most existing learning-based methods invariably utilize all the information within an image for pose estimation. Although these methods have demonstrated leading pose accuracy in some cases, they are still far from being sufficient to handle the robustness under challenging viewpoints with less impacts on the localization accuracy for viewpoints that are easier to localize. In this paper, we propose a novel two-branch camera pose estimation framework: one branch utilizes keypoint-guided partial scene coordinate regression, while the other employs full scene coordinate regression to assess the credibility of image poses, thereby enabling more accurate camera localization. In particular, we devise a keypoint selection method predicated on matching rates which is designed to measure the matching quality between a 3D keypoint and 2D keypoints across views. With these selected 3D keypoints, we can generate 2D supervision mask with the ground-truth camera pose to supervise the keypoint prediction from the keypoint selection network. Meanwhile, we further refine the 2D supervision mask through the optimization with reprojection errors on the scene coordinate network, which estimates the scene coordinates for points within the scene that truly warrant attention, also enhances the localization performance. We also introduce a gated camera pose estimation strategy on the two-branch pose estimation framework, employing an updated keypoint selection network for images with higher credibility and a more robust network for difficult viewpoints. By adopting an effective curriculum learning scheme, we achieve higher accuracy within a training span of just 20 minutes. Our method's superior performance is validated through rigorous experimentation. The code is released at https://github.com/DUT-ICCD/KP-Guided-Reloc.

LifelongGlue: Keypoint Matching for 3D Reconstruction with Continual Neural Networks

Brain Inspired Keypoint Matching for 3D Scene Reconstruction

CMDGAT: Knowledge extraction and retention based continual graph attention network for point cloud registration

Multi-Task Joint Learning of 3D Keypoint Saliency and Correspondence Estimation

Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept

DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization

Exploring Matching Rates: from Keypoint Selection to Camera Relocalization

Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction.

Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network

Lifelong 3D Object Recognition and Grasp Synthesis Using Dual Memory Recurrent Self-Organization Networks

L3DOR: Lifelong 3D Object Recognition.

Parallel K Nearest Neighbor Matching for 3D Reconstruction.

Keypoints-guided Lightweight Network for Single-view 3D Human Reconstruction

A Multi-Task Neural Network for Action Recognition with 3D Key-Points.

Object Reconstruction Based on Attentive Recurrent Network from Single and Multiple Images

Repeatable Adaptive Keypoint Detection Via Self-Supervised Learning

Learning 2D-3D Correspondences To Solve The Blind Perspective-n-Point Problem

2D3D-MatchNet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

Fast-Image2Point: Towards Real-Time Point Cloud Reconstruction of a Single Image using 3D Supervision

A Feature Matching Method Based on the Convolutional Neural Network