A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction

Jianghao Shen,Nan Xue,Tianfu Wu

2024-06-03

Abstract:Learning 3D scene representation from a single-view image is a long-standing fundamental problem in computer vision, with the inherent ambiguity in predicting contents unseen from the input view. Built on the recently proposed 3D Gaussian Splatting (3DGS), the Splatter Image method has made promising progress on fast single-image novel view synthesis via learning a single 3D Gaussian for each pixel based on the U-Net feature map of an input image. However, it has limited expressive power to represent occluded components that are not observable in the input view. To address this problem, this paper presents a Hierarchical Splatter Image method in which a pixel is worth more than one 3D Gaussians. Specifically, each pixel is represented by a parent 3D Gaussian and a small number of child 3D Gaussians. Parent 3D Gaussians are learned as done in the vanilla Splatter Image. Child 3D Gaussians are learned via a lightweight Multi-Layer Perceptron (MLP) which takes as input the projected image features of a parent 3D Gaussian and the embedding of a target camera view. Both parent and child 3D Gaussians are learned end-to-end in a stage-wise way. The joint condition of input image features from eyes of the parent Gaussians and the target camera position facilitates learning to allocate child Gaussians to ``see the unseen'', recovering the occluded details that are often missed by parent Gaussians. In experiments, the proposed method is tested on the ShapeNet-SRN and CO3D datasets with state-of-the-art performance obtained, especially showing promising capabilities of reconstructing occluded contents in the input view.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper mainly discusses the problem of single-view 3D reconstruction, which is a fundamental challenge in computer vision because there is inherent uncertainty in extracting 3D scene information (such as geometry and appearance) from a single 2D image. The paper proposes a method called "Hierarchical Splatter Image" (HSI) to improve the existing 3D Gaussian Splatter (3DGS) technique for more accurate recovery of occluded parts in single-view 3D reconstruction. 3DGS represents scenes by learning a large number of 3D Gaussian mixtures, but this approach has limited performance when dealing with occluded or invisible parts of objects. The paper introduces the concept of each pixel being composed of a parent 3D Gaussian and a few sub 3D Gaussians. The parent 3D Gaussian follows the same approach as the original method, while the sub 3D Gaussians are learned using a lightweight multi-layer perceptron (MLP) that takes the image features of the parent 3D Gaussian and embedded views of the target camera as input. This allows the sub 3D Gaussians to better perceive the structure and recover occlusion details that are not visible in the input view. Experiments demonstrate that this method achieves state-of-the-art performance on ShapeNet-SRN and CO3D datasets, especially in reconstructing occluded content in the input view, showing significant improvement. The main contributions of the paper include extending the pixel-to-3D Gaussian mapping from one-to-one to one-to-many, introducing a hierarchical parent-sub 3D Gaussian representation, and proposing an effective method to learn the parent and sub Gaussians to enhance the structural perception capability.

A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Unbounded-GS: Extending 3D Gaussian Splatting with Hybrid Representation for Unbounded Large-Scale Scene Reconstruction

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians

3D-HGS: 3D Half-Gaussian Splatting

Gaussian Splatting in Style

Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections

Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images

Planar Gaussian Splatting

Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors

FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting