Abstract:Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. Please see our project page at <a class="link-external link-https" href="https://zouzx.github.io/TriplaneGaussian/" rel="external noopener nofollow">this https URL</a>.

PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs

An Example-Based Approach to 3D Man-Made Object Reconstruction from Line Drawings

Model-driven sketch reconstruction with structure-oriented retrieval

Embedding Visual Cognition in 3D Reconstruction from Multi-View Engineering Drawings

Learning to Infer and Execute 3D Shape Programs

SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations

Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly

Deep Learning Assisted Optimization for 3D Reconstruction from Single 2D Line Drawings

Learning to Reconstruct 3D Structure from Object Motion.

PlaneFormers: From Sparse View Planes to 3D Reconstruction

Three cheers for the pain pump?

Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds

Deep3DSketch: 3D modeling from Free-hand Sketches with View- and Structural-Aware Adversarial Training

Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data.

Multi-view 3D Reconstruction with Transformer

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

Geometric Point Attention Transformer for 3D Shape Reassembly

3D VR Sketch Guided 3D Shape Prototyping and Exploration

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images

Learning 3D Object Shape and Layout without 3D Supervision