GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Changkun Liu,Shuai Chen,Yash Bhalgat,Siyan Hu,Ming Cheng,Zirui Wang,Victor Adrian Prisacariu,Tristan Braud
2024-10-02
Abstract:We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences. GSLoc obviates the need for training feature extractors or descriptors by operating directly on RGB images, utilizing the 3D foundation model, MASt3R, for precise 2D matching. To improve the robustness of our model in challenging outdoor environments, we incorporate an exposure-adaptive module within the 3DGS framework. Consequently, GSLoc enables efficient one-shot pose refinement given a single RGB query and a coarse initial pose estimation. Our proposed approach surpasses leading NeRF-based optimization methods in both accuracy and runtime across indoor and outdoor visual localization benchmarks, achieving new state-of-the-art accuracy on two indoor datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the accuracy and efficiency of camera pose estimation. Specifically, the author proposes a novel camera - pose refinement framework named GSLoc at test - time, aiming to improve the pose - estimation accuracy of existing Absolute Pose Regression (APR) and Scene Coordinate Regression (SCR) methods. ### Main Problems 1. **Limitations of Existing Methods**: - **APR Method**: Although it has a fast inference speed, it performs poorly in terms of accuracy and generalization ability. - **SCR Method**: Although it has a relatively high accuracy, it has a high computational complexity. - **NeRF - based Method**: Such as NeFeS, although it can improve the accuracy of APR methods, it has limitations in optimization speed and the effect on SCR methods. 2. **Challenges**: - **Low Accuracy and Slow Convergence**: Existing NeRF - based methods have problems of slow convergence and limited accuracy. - **Need to Train Customized Feature Descriptors**: Many methods rely on feature extractors or descriptors specific to a certain scene, which increases the training cost and deployment difficulty. ### Solutions To solve the above problems, GSLoc proposes the following innovations: 1. **3D Gaussian Splatting (3DGS)**: - Use 3DGS as a scene representation, and utilize its high - quality and fast Novel View Synthesis (NVS) ability to render images and depth maps, so as to efficiently establish 2D - 3D correspondences between query images and rendered images. 2. **Exposure - Adaptive Module**: - Introduce an Exposure - Adaptive Affine Color Transformation module (ACT) to enhance the robustness of the model in challenging outdoor environments and ensure the appearance consistency between the rendered image and the query image. 3. **Direct RGB Image Matching**: - Utilize the pre - trained 3D vision foundation model MASt3R for accurate 2D - 2D matching, eliminating the need for scene - specific feature extractors or descriptors, which significantly accelerates the method and simplifies the deployment. 4. **Fast Relative Pose Estimation Variant (GSLoc rel)**: - Propose a faster alternative, GSLoc rel, which estimates the relative pose through the point - map registration function of MASt3R without matching, further improving the computational efficiency. ### Experimental Verification The paper conducts experiments on multiple widely - used visual - localization datasets (such as 7Scenes, 12Scenes and Cambridge Landmarks), and the results show that GSLoc outperforms existing NeRF - based methods and other mainstream methods in both accuracy and efficiency. In conclusion, this paper successfully solves the deficiencies of existing camera - pose - estimation methods in terms of accuracy and efficiency through introducing 3DGS and a series of technological innovations, achieving more efficient and accurate pose refinement.