FaVoR: Features via Voxel Rendering for Camera Relocalization

Vincenzo Polizzi,Marco Cannici,Davide Scaramuzza,Jonathan Kelly
2024-09-12
Abstract:Camera relocalization methods range from dense image alignment to direct camera pose regression from a query image. Among these, sparse feature matching stands out as an efficient, versatile, and generally lightweight approach with numerous applications. However, feature-based methods often struggle with significant viewpoint and appearance changes, leading to matching failures and inaccurate pose estimates. To overcome this limitation, we propose a novel approach that leverages a globally sparse yet locally dense 3D representation of 2D features. By tracking and triangulating landmarks over a sequence of frames, we construct a sparse voxel map optimized to render image patch descriptors observed during tracking. Given an initial pose estimate, we first synthesize descriptors from the voxels using volumetric rendering and then perform feature matching to estimate the camera pose. This methodology enables the generation of descriptors for unseen views, enhancing robustness to view changes. We extensively evaluate our method on the 7-Scenes and Cambridge Landmarks datasets. Our results show that our method significantly outperforms existing state-of-the-art feature representation techniques in indoor environments, achieving up to a 39% improvement in median translation error. Additionally, our approach yields comparable results to other methods for outdoor scenarios while maintaining lower memory and computational costs.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
This paper attempts to solve the key problems in visual relocalization (camera relocalization), especially when facing significant view and appearance changes. Traditional feature - based methods often perform poorly, leading to matching failures and inaccurate pose estimation. Specifically: - **Problem Background**: Existing visual relocalization methods have limitations when dealing with significant view and appearance changes, especially in terms of feature matching. Although traditional feature - based methods are efficient and lightweight, they are prone to matching failures or inaccurate pose estimation in these cases. - **Shortcomings of Existing Methods**: - Dense descriptor representation methods (such as NeRF - based methods) improve performance but require more training time and memory resources. - Sparse descriptor synthesis methods have difficulty in rendering high - dimensional descriptors, which limits their application range. - **Solution Proposed in the Paper**: To overcome these limitations, the authors propose FaVoR (Features via Voxel Rendering), a new feature - rendering method. This method uses a pre - trained neural network to extract robust features and encodes and renders feature descriptors in 3D space through sparse voxel representation. The main features of FaVoR include: - **Globally Sparse but Locally Dense 3D Representation**: By tracking and triangulating feature points in multiple frames, a sparse voxel map is constructed and optimized to render the observed image patch descriptors. - **3D Point Descriptor Extraction under View Conditions**: Efficiently extract descriptors from any query camera pose. - **Low Resource Consumption**: Compared with other methods, FaVoR reduces the computational burden and improves scalability. - **Main Contributions**: - A sparse voxel algorithm that does not require learning a dense volumetric scene representation is proposed. - It shows how to render high - dimensional descriptors, providing better view invariance. - Experiments on the 7 - Scenes and Cambridge Landmarks datasets show that FaVoR significantly outperforms existing implicit feature - rendering methods, reducing the median translation error by up to 39% in indoor environments. In summary, this paper aims to solve the limitations of existing visual relocalization techniques when dealing with significant view and appearance changes by introducing the FaVoR method, providing a more efficient and robust solution.