Abstract:Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. Recently, 3D Gaussian Splatting (3DGS) has shown promise for photorealistic and real-time NVS of static scenes. Building on 3DGS, we propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our key innovation is a residual-based spherical harmonic coefficients transfer module that adapts 3DGS to varying lighting conditions and photometric post-processing. This lightweight module can be pre-computed and ensures efficient gradient propagation from rendered images to 3D Gaussian attributes. Additionally, we observe that the appearance encoder and the transient mask predictor, the two most critical parts of NVS from unconstrained photo collections, can be mutually beneficial. We introduce a plug-and-play lightweight spatial attention module to simultaneously predict transient occluders and latent appearance representation for each image. After training and preprocessing, our method aligns with the standard 3DGS format and rendering pipeline, facilitating seamlessly integration into various 3DGS applications. Extensive experiments on diverse datasets show our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problems of real - scene reconstruction and novel view synthesis (NVS) from unconstrained photo collections. Specifically, the paper focuses on the following key challenges: 1. **Illumination conditions and post - processing variations**: In unconstrained photo collections, due to different shooting times and locations, illumination conditions and post - processing of photos may vary greatly. These variations have a negative impact on the quality of scene reconstruction and novel view synthesis. 2. **Prediction of transient occluders**: Images in unconstrained photo collections may contain transient occluders such as pedestrians and vehicles. Accurately predicting these transient occluders is crucial for generating high - quality novel views and appearance synthesis. 3. **Temporal - spatial efficiency**: Existing methods usually need to introduce additional learnable parameters and training strategies when dealing with unconstrained photo collections, resulting in slow convergence of the training process and inability to achieve real - time rendering and fast training. 4. **Compact data storage**: Existing methods often require a large amount of memory storage when dealing with large - scale unconstrained photo collections, which limits their wide use in practical applications. ### Solutions To address the above challenges, the paper proposes **WE - GS** (Weighted Efficient 3D Gaussian Splatting), a point - based differentiable rendering framework for reconstructing scenes from unconstrained photo collections. The main innovations of WE - GS include: 1. **Residual - based spherical harmonic coefficient transfer module**: This module adapts to different illumination conditions and post - processing by learning image - specific residual spherical harmonic coefficients. This module is lightweight and pre - computable, ensuring efficient gradient propagation while retaining the efficiency of vanilla 3DGS. 2. **Lightweight spatial attention module**: This module simultaneously predicts transient occluder masks and latent appearance representations, improving the accuracy of transient occluder prediction and the representativeness of latent appearance representations. This design takes advantage of the mutual benefits between the appearance encoder and the transient occluder predictor. 3. **Optimization process**: By introducing multiple loss functions (such as L1 loss, structural similarity index (SSIM) loss, regularization loss, etc.), it is ensured that the model can efficiently optimize parameters during the training process and generate high - quality novel view and appearance synthesis results. ### Experimental results The paper has carried out extensive experiments on multiple datasets, including the PhotoTourism dataset and the NeRF - OSR dataset. The experimental results show that WE - GS has reached a new state - of - the - art level in terms of training speed, rendering frame rate (FPS), and the quality of novel view or novel appearance synthesis. Specifically: - On the PhotoTourism dataset, while maintaining real - time rendering speed, WE - GS reduces the storage requirement by more than 2 times and increases the average PSNR by 6.6 dB. - On the NeRF - OSR dataset, WE - GS outperforms other methods in terms of metrics such as PSNR, SSIM, and LPIPS. ### Summary By introducing the residual - based spherical harmonic coefficient transfer module and the lightweight spatial attention module, WE - GS effectively solves the problem of efficient and high - quality scene reconstruction and novel view synthesis from unconstrained photo collections. This method not only reaches a new state - of - the - art level in performance, but also performs well in terms of temporal - spatial efficiency and data storage.

WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

AAGS: Appearance-Aware 3D Gaussian Splattingwith Unconstrained Photo Collections

Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections

Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

Unbounded-GS: Extending 3D Gaussian Splatting with Hybrid Representation for Unbounded Large-Scale Scene Reconstruction

Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

3D-HGS: 3D Half-Gaussian Splatting

Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting

3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

Dynamic 3D Gaussian Fields for Urban Areas

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting