Abstract:Visual localization plays a critical role in the functionality of low-cost autonomous mobile robots. Contemporary leading methods for precise visual localization are predominantly 3D scene-specific, necessitating extra computational and memory overhead to construct a 3D scene model in novel environments. An alternative approach of directly using a database of 2D images for visual localization offers more flexibility. However, such methods currently suffer from limited localization accuracy. In this paper, we propose an accurate and robust multiple checking-based 3D model-free visual localization system to address the aforementioned issues. To ensure high accuracy, our focus is on estimating the pose of a query image relative to the retrieved database images using 2D-2D feature matches. Theoretically, by incorporating the local planar motion constraint into both the estimation of the essential matrix and the triangulation stages, we reduce the minimum required feature matches for absolute pose estimation, thereby enhancing the robustness of outlier rejection. Additionally, we introduce a multiple-checking mechanism to ensure the correctness of the solution throughout the solving process. The efficacy of our approach is substantiated through both qualitative and quantitative assessments on simulated and two real-world datasets evidencing significant improvements in accuracy and robustness provided by our 3D model-free visual localization system. Note to Practitioners-The motivation of this article stems from the need to develop an accurate visual localization system with simplicity and flexibility of map construction and easy adaption to new environments. Such a system holds great practical value for a range of applications, including warehouse robots, service robots, and countless others. Existing visual localization systems that achieve high accuracy are dependent on a pre-built accurate 3D scene map, which pose challenges in terms of map construction and consume significant storage resources onboard, particularly for large scenes. And the aforementioned efforts need to be repeated when changing to a new scene. In this article, an accurate and robust 3D model-free visual localization system is proposed to handle this problem. The map construction is simplified to build a set of database images with associated camera poses, which is trivial as it amounts to adding posed images to a database. The core idea for achieving high accuracy and robustness is to model the local planar motion characteristic of general ground-moving robots into both essential matrix estimation and triangulation stages to obtain two minimal solutions. The proposed localization system simplifies the task of switching between different application scenarios for the robot, reducing additional workload and lowering the difficulty of use.

EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

LocNet: Global Localization in 3D Point Clouds for Mobile Vehicles

3D Model-free Visual Localization System from Essential Matrix under Local Planar Motion

Long-Term Map-Based Visual Localization: Analysis of Individual Components of a Hierarchical Pipeline

2-Entity RANSAC for Robust Visual Localization in Changing Environment

Learning to Produce Semi-dense Correspondences for Visual Localization

DP-Loc: Visual Localization in 2D Maps Using an Embedded Depth Prior

Learning Bipartite Graph Matching for Robust Visual Localization.

2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision

Visual Localization in a Prior 3D LiDAR Map Combining Points and Lines

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization

BDLoc: Global Localization from 2.5D Building Map

P2-Net - Joint Description and Detection of Local Features for Pixel and Point Matching.

I2D-Loc: Camera Localization Via Image to LiDAR Depth Flow

Exploring Matching Rates: from Keypoint Selection to Camera Relocalization

Leveraging local and global descriptors in parallel to search correspondences for visual localization

Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization

InLoc: Indoor Visual Localization with Dense Matching and View Synthesis