Survey on Fundamental Deep Learning 3D Reconstruction Techniques

Yonge Bai,LikHang Wong,TszYin Twan
2024-07-11
Abstract:This survey aims to investigate fundamental deep learning (DL) based 3D reconstruction techniques that produce photo-realistic 3D models and scenes, highlighting Neural Radiance Fields (NeRFs), Latent Diffusion Models (LDM), and 3D Gaussian Splatting. We dissect the underlying algorithms, evaluate their strengths and tradeoffs, and project future research trajectories in this rapidly evolving field. We provide a comprehensive overview of the fundamental in DL-driven 3D scene reconstruction, offering insights into their potential applications and limitations.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The paper aims to investigate deep learning (DL)-based 3D reconstruction techniques that can generate realistic 3D models and scenes. Specifically, the paper focuses on the following methods: 1. **Neural Radiance Fields (NeRF)**: - NeRF is a method that synthesizes new views of complex scenes from a set of input views by optimizing a model to approximate a continuous volumetric scene or surface. - NeRF represents the volume as a multi-layer perceptron (MLP), with the input being a 5-dimensional vector (x, y, z, θ, ϕ) and the output being a 4-dimensional vector (R, G, B, σ), representing RGB color and volumetric density, respectively. 2. **Latent Diffusion Models (LDM)**: - Latent Diffusion Models combine the advantages of diffusion models and variational autoencoders (VAE) to improve the efficiency and scalability of generating high-quality images by compressing the image data space. - During training, LDM first uses VAE to learn the encoding function E(x) and decoding function D(z), and then trains an Attention-U-Net as the denoising model. 3. **3D Gaussian Splatting**: - The 3D Gaussian Splatting method represents point clouds in a scene using Gaussian functions, enabling high-quality and real-time rendering of 3D scene reconstruction. - This method first generates a sparse point cloud using Structure from Motion (SfM) techniques, then adjusts the positions, sizes, and densities of the Gaussians through an optimization process. The core objective of the paper is to analyze these fundamental deep learning 3D reconstruction techniques, evaluate their advantages and trade-offs, and predict future research directions. Additionally, the paper explores the potential applications and limitations of these techniques.