Abstract:Hierarchical methods represent state-of-the-art visual localization, optimizing search efficiency by using global descriptors to focus on relevant map regions. However, this state-of-the-art performance comes at the cost of substantial memory requirements, as all database images must be stored for feature matching. In contrast, direct 2D-3D matching algorithms require significantly less memory but suffer from lower accuracy due to the larger and more ambiguous search space. We address this ambiguity by fusing local and global descriptors using a weighted average operator within a 2D-3D search framework. This fusion rearranges the local descriptor space such that geographically nearby local descriptors are closer in the feature space according to the global descriptors. Therefore, the number of irrelevant competing descriptors decreases, specifically if they are geographically distant, thereby increasing the likelihood of correctly matching a query descriptor. We consistently improve the accuracy over local-only systems and achieve performance close to hierarchical methods while halving memory requirements. Extensive experiments using various state-of-the-art local and global descriptors across four different datasets demonstrate the effectiveness of our approach. For the first time, our approach enables direct matching algorithms to benefit from global descriptors while maintaining memory efficiency. The code for this paper will be published at \href{<a class="link-external link-https" href="https://github.com/sontung/descriptor-disambiguation" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-http" href="http://github.com/sontung/descriptor-disambiguation" rel="external noopener nofollow">this http URL</a>}.

What problem does this paper attempt to address?

This paper attempts to solve the problem of search - space ambiguity in visual localization, especially in the direct 2D - 3D matching algorithm. Specifically: 1. **Background and Problem**: - Visual localization is to determine the position and orientation of a camera or a robot in its environment by analyzing RGB images. - Although the direct 2D - 3D matching algorithm is memory - efficient, in large - scale maps, due to perceptual aliasing, the search space is ambiguous, resulting in a large number of false matches and affecting the positioning accuracy. - Hierarchical methods improve the search efficiency and accuracy by using global descriptors to focus on relevant map areas, but they need to store all database images, resulting in a large memory requirement. 2. **Research Objectives**: - The author proposes a method (FUSELOC) that fuses global and local descriptors to reduce the search - space ambiguity in direct 2D - 3D matching. - By introducing global descriptors, geographically close local descriptors are closer in the feature space, thereby reducing the number of irrelevant competing descriptors and increasing the probability of correct matching. 3. **Solutions**: - Use a weighted - average operator to fuse the global and local descriptors together to form a new descriptor for 2D - 3D matching. - This fusion method not only significantly improves the matching accuracy, but also has a performance close to that of hierarchical methods while maintaining low memory consumption. 4. **Experimental Results**: - Extensive experiments on four different datasets show that this method significantly improves the positioning accuracy without increasing the memory usage. - Compared with the system using only local descriptors, this method on average improves the success rate by 2 - 6% on multiple datasets and narrows the performance gap with hierarchical methods. In summary, this paper aims to solve the problem of search - space ambiguity in the direct 2D - 3D matching algorithm by fusing global and local descriptors, thereby improving the accuracy and efficiency of visual localization while maintaining a low memory requirement.

FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization

Degeneration-Aware Localization with Arbitrary Global-Local Sensor Fusion.

Leveraging local and global descriptors in parallel to search correspondences for visual localization

Descriptor Ensemble: An Unsupervised Approach to Descriptor Fusion in the Homography Space

DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence

Learning to fuse local geometric features for 3D rigid data matching

Affine-invariant SIFT Descriptor with Global Context

Improving Feature-based Visual Localization by Geometry-Aided Matching

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

Context-based local-global fusion network for 3D point cloud classification and segmentation

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D shape recognition

Uniting Keypoints: Local Visual Information Fusion for Large-Scale Image Search

A Local-Global Feature Fusing Method for Point Clouds Semantic Segmentation

Hierarchical Multi-Process Fusion for Visual Place Recognition

MTLDesc: Looking Wider to Describe Better

IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration

SuperGF: Unifying Local and Global Features for Visual Localization