Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization

Christian Schmidt,Jens Piekenbrinck,Bastian Leibe
2024-10-11
Abstract:3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, limiting its applicability in real-world scenarios where acquiring accurate camera poses can be challenging or even impossible. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We derive the analytical gradients and integrate their computation with the existing high-performance CUDA implementation. This enables downstream tasks such as 6-DoF camera pose estimation as well as joint reconstruction and camera refinement. In particular, we achieve rapid convergence and high accuracy for pose estimation on real-world scenes. Our method enables fast reconstruction of 3D scenes without requiring accurate pose information by jointly optimizing geometry and camera poses, while achieving state-of-the-art results in novel-view synthesis. Our approach is considerably faster to optimize than most competing methods, and several times faster in rendering. We show results on real-world scenes and complex trajectories through simulated environments, achieving state-of-the-art results on LLFF while reducing runtime by two to four times compared to the most efficient competing method. Source code will be available at <a class="link-external link-https" href="https://github.com/Schmiddo/noposegs" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to address the problem of achieving high-quality Novel View Synthesis (NVS) without accurate camera pose information. Specifically, traditional 3D Gaussian Splatting methods rely on accurate camera pose information for novel view synthesis, which is often difficult or impossible to obtain in practical applications. To solve this problem, the authors propose an extended 3D Gaussian Splatting framework that achieves novel view synthesis without accurate pose initialization by optimizing the external parameters of the camera (i.e., the camera pose). ### Main Contributions 1. **Differentiable Camera Pose Estimation**: The authors propose a differentiable camera pose estimation method based on Gaussian Splatting and efficiently integrate it into the existing CUDA implementation, enabling fast optimization. 2. **Robustness Enhancement**: To improve the robustness of the method to noisy pose initialization, the authors introduce an anisotropic loss term to avoid rapid convergence to suboptimal local minima and overfitting to training views. 3. **Experimental Validation**: Experimental results show that the method achieves state-of-the-art novel view synthesis and pose estimation results in various real-world scenarios and complex trajectories while significantly reducing runtime. ### Method Overview - **Review of Gaussian Splatting**: 3D Gaussian Splatting represents a 3D scene as a set of anisotropic Gaussian distributions, each parameterized by a 3D mean, covariance matrix, eigenvectors, and opacity. Using a differentiable rasterization algorithm, images can be rendered given the camera's intrinsic and extrinsic parameters. - **Camera Pose Optimization**: The authors model the camera pose as an element of the SE(3) Lie group, derive the corresponding gradients, and integrate them with the existing CUDA rendering kernel to achieve efficient camera pose optimization. - **Joint Reconstruction and Pose Optimization**: During the optimization process, both Gaussian parameters and camera poses are optimized simultaneously to achieve high-quality novel view synthesis and pose estimation. ### Experimental Results - **Camera Pose Estimation**: Experimental results on the LLFF dataset show that the method converges much faster under a single pose hypothesis compared to the multi-hypothesis optimization method piNeRF. - **Novel View Synthesis**: Experimental results on the LLFF, Replica, and Tanks and Temples datasets demonstrate that the method can generate high-quality novel view images without accurate pose information, with significantly reduced runtime. ### Conclusion This paper proposes an innovative method that achieves high-quality novel view synthesis without accurate pose initialization through differentiable camera pose optimization. The method achieves state-of-the-art results on multiple datasets and has high runtime efficiency.