Abstract:Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. Please see our project page at <a class="link-external link-https" href="https://zouzx.github.io/TriplaneGaussian/" rel="external noopener nofollow">this https URL</a>.

R4D-planes: Remapping Planes for Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

DaRePlane: Direction-aware Representations for Dynamic Scene Reconstruction

DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras

PlanarRecon: Realtime 3D Plane Detection and Reconstruction from Posed Monocular Videos

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields

HexPlane: A Fast Representation for Dynamic Scenes

Remote Sensing Novel View Synthesis With Implicit Multiplane Representations

Fast View Synthesis of Casual Videos with Soup-of-Planes

SNeRF: Stylized Neural Implicit Representations for 3D Scenes

Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis

DaReNeRF: Direction-aware Representation for Dynamic Scenes

4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes

Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting

TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes

High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes.

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

DynPoint: Dynamic Neural Point For View Synthesis

Neural Radiance Flow for 4D View Synthesis and Video Processing

Real-time dense 3D reconstruction and camera tracking via embedded planes representation

Multi-Plane Neural Radiance Fields for Novel View Synthesis