Abstract:3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, limiting its applicability in real-world scenarios where acquiring accurate camera poses can be challenging or even impossible. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We derive the analytical gradients and integrate their computation with the existing high-performance CUDA implementation. This enables downstream tasks such as 6-DoF camera pose estimation as well as joint reconstruction and camera refinement. In particular, we achieve rapid convergence and high accuracy for pose estimation on real-world scenes. Our method enables fast reconstruction of 3D scenes without requiring accurate pose information by jointly optimizing geometry and camera poses, while achieving state-of-the-art results in novel-view synthesis. Our approach is considerably faster to optimize than most competing methods, and several times faster in rendering. We show results on real-world scenes and complex trajectories through simulated environments, achieving state-of-the-art results on LLFF while reducing runtime by two to four times compared to the most efficient competing method. Source code will be available at <a class="link-external link-https" href="https://github.com/Schmiddo/noposegs" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

This paper attempts to address the problem of achieving high-quality Novel View Synthesis (NVS) without accurate camera pose information. Specifically, traditional 3D Gaussian Splatting methods rely on accurate camera pose information for novel view synthesis, which is often difficult or impossible to obtain in practical applications. To solve this problem, the authors propose an extended 3D Gaussian Splatting framework that achieves novel view synthesis without accurate pose initialization by optimizing the external parameters of the camera (i.e., the camera pose). ### Main Contributions 1. **Differentiable Camera Pose Estimation**: The authors propose a differentiable camera pose estimation method based on Gaussian Splatting and efficiently integrate it into the existing CUDA implementation, enabling fast optimization. 2. **Robustness Enhancement**: To improve the robustness of the method to noisy pose initialization, the authors introduce an anisotropic loss term to avoid rapid convergence to suboptimal local minima and overfitting to training views. 3. **Experimental Validation**: Experimental results show that the method achieves state-of-the-art novel view synthesis and pose estimation results in various real-world scenarios and complex trajectories while significantly reducing runtime. ### Method Overview - **Review of Gaussian Splatting**: 3D Gaussian Splatting represents a 3D scene as a set of anisotropic Gaussian distributions, each parameterized by a 3D mean, covariance matrix, eigenvectors, and opacity. Using a differentiable rasterization algorithm, images can be rendered given the camera's intrinsic and extrinsic parameters. - **Camera Pose Optimization**: The authors model the camera pose as an element of the SE(3) Lie group, derive the corresponding gradients, and integrate them with the existing CUDA rendering kernel to achieve efficient camera pose optimization. - **Joint Reconstruction and Pose Optimization**: During the optimization process, both Gaussian parameters and camera poses are optimized simultaneously to achieve high-quality novel view synthesis and pose estimation. ### Experimental Results - **Camera Pose Estimation**: Experimental results on the LLFF dataset show that the method converges much faster under a single pose hypothesis compared to the multi-hypothesis optimization method piNeRF. - **Novel View Synthesis**: Experimental results on the LLFF, Replica, and Tanks and Temples datasets demonstrate that the method can generate high-quality novel view images without accurate pose information, with significantly reduced runtime. ### Conclusion This paper proposes an innovative method that achieves high-quality novel view synthesis without accurate pose initialization through differentiable camera pose optimization. The method achieves state-of-the-art results on multiple datasets and has high runtime efficiency.

Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

COLMAP-Free 3D Gaussian Splatting

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis

Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting

GGRt: Towards Generalizable 3D Gaussians Without Pose Priors in Real-Time

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Object Gaussian for Monocular 6D Pose Estimation from Sparse Views

DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair

KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

Camera Pose Estimation Using a 3D Gaussian Splatting Radiance Field

Feature Splatting for Better Novel View Synthesis with Low Overlap

Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting

FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training

WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis