Abstract:We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.

What problem does this paper attempt to address?

### Main Problems Addressed by the Paper This paper primarily addresses the problem of general novel view synthesis from sparse image observations. Specifically: 1. **Efficient Rendering and Training**: To tackle the issues of high memory and time consumption in existing differentiable rendering methods, a feedforward model `pixelSplat` is proposed. This model enables real-time and memory-efficient rendering, as well as fast 3D reconstruction. 2. **Overcoming Local Minima**: To address the problem of representations based on 3D Gaussian primitives easily falling into local minima, a method is proposed to predict the probability density of 3D Gaussian distributions. By using reparameterization techniques, the sampling operation becomes differentiable, allowing gradients to backpropagate through the Gaussian splatting representation. 3. **Solving Scale Ambiguity**: To solve the issue of camera poses in real-world datasets being reconstructed up to an arbitrary scale factor, a multi-view bilinear transformer is designed to reliably infer the scale factor for each scene. 4. **Generating Editable 3D Representations**: Unlike methods that focus solely on accelerating rendering without reconstructing interpretable or editable 3D scene representations, this model can reconstruct interpretable and editable 3D radiance fields from image pairs. 5. **Performance Improvement**: When performing novel view synthesis on real-world datasets such as RealEstate10k and ACID, this method outperforms state-of-the-art light field transformers in terms of Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS), and achieves a rendering speed improvement of three orders of magnitude. ### Technical Innovations - **Pixel-Aligned 3D Gaussian Primitive Prediction**: By predicting the positional probability distribution of 3D Gaussian primitives for each pixel, rather than directly predicting the position itself, the problem of local minima is avoided. - **Multi-View Bilinear Transformer**: By utilizing surface correspondences between dual views and combining depth information from positional encoding, the issue of scale ambiguity is resolved. - **Different Parameterization and Sampling Strategies**: By setting the opacity of Gaussian primitives equal to the probability of the sampled depth bucket, the sampling operation becomes differentiable, allowing for effective model training.

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives

FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

3D Gaussian Splatting for Real-Time Radiance Field Rendering

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

3D-HGS: 3D Half-Gaussian Splatting

latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction

GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting

2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections

SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting

HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors