Abstract:Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. Please see our project page at <a class="link-external link-https" href="https://zouzx.github.io/TriplaneGaussian/" rel="external noopener nofollow">this https URL</a>.

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance

Spatial Transformer for 3D Point Clouds

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction

LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving

S^3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

SEGT: A General Spatial Expansion Group Transformer for nuScenes Lidar-based Object Detection Task