2L3: Lifting Imperfect Generated 2D Images into Accurate 3D

Yizheng Chen,Rengan Xie,Qi Ye,Sen Yang,Zixuan Xie,Tianxiao Chen,Rong Li,Yuchi Huo

2024-01-29

Abstract:Reconstructing 3D objects from a single image is an intriguing but challenging problem. One promising solution is to utilize multi-view (MV) 3D reconstruction to fuse generated MV images into consistent 3D objects. However, the generated images usually suffer from inconsistent lighting, misaligned geometry, and sparse views, leading to poor reconstruction quality. To cope with these problems, we present a novel 3D reconstruction framework that leverages intrinsic decomposition guidance, transient-mono prior guidance, and view augmentation to cope with the three issues, respectively. Specifically, we first leverage to decouple the shading information from the generated images to reduce the impact of inconsistent lighting; then, we introduce mono prior with view-dependent transient encoding to enhance the reconstructed normal; and finally, we design a view augmentation fusion strategy that minimizes pixel-level loss in generated sparse views and semantic loss in augmented random views, resulting in view-consistent geometry and detailed textures. Our approach, therefore, enables the integration of a pre-trained MV image generator and a neural network-based volumetric signed distance function (SDF) representation for a single image to 3D object reconstruction. We evaluate our framework on various datasets and demonstrate its superior performance in both quantitative and qualitative assessments, signifying a significant advancement in 3D object reconstruction. Compared with the latest state-of-the-art method Syncdreamer~\cite{liu2023syncdreamer}, we reduce the Chamfer Distance error by about 36\% and improve PSNR by about 30\% .

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper focuses on how to reconstruct accurate 3D objects from imperfect 2D generated images. Existing multi-view 3D reconstruction methods often encounter problems such as inconsistent lighting, geometric misalignment, and sparse views when dealing with images generated by generative models, resulting in a decrease in reconstruction quality. To address these issues, the paper proposes a new 3D reconstruction framework that uses intrinsic decomposition guidance, single-frame transient normal priors, and view enhancement to solve these three problems respectively. Specifically, they first reduce the impact of inconsistent lighting by decoupling the shadow information. Then they introduce view-dependent transient encoding to enhance the reconstruction of normals. Finally, they design a view enhancement fusion strategy to reduce pixel-level loss and semantic loss in sparse and random views, thus achieving view-consistent geometric shapes and detailed textures. By combining a pretrained multi-view image generator with a neural network-based voxel SDF representation, the paper improves the performance of reconstructing 3D objects from single images, texts, or other input conditions. Experimental results show that their method outperforms existing techniques on multiple datasets, significantly reducing Chamfer Distance errors and improving Peak Signal-to-Noise Ratio (PSNR). In conclusion, the main contributions of the paper are as follows: 1. Proposing a multi-view 3D reconstruction method for imperfect generated images, which can be easily applied to 3D generation based on 2D image generation. 2. Introducing intrinsic decomposition technology to address inconsistent lighting, improve reconstruction quality, and obtain reflection components. 3. Improving the geometric details and consistency of 3D object generation through the use of normal prior models and single-frame transient geometry encoding. 4. Innovating the view enhancement scheme, generating semantic guidance through densely sampled random views to alleviate the problem of insufficient supervision caused by sparse views.

2L3: Lifting Imperfect Generated 2D Images into Accurate 3D

Hybrid-MVS: Robust Multi-View Reconstruction with Hybrid Optimization of Visual and Depth Cues

DP-MVS: Detail Preserving Multi-View Surface Reconstruction of Large-Scale Scenes

VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

From 2D Images to 3D Model:Weakly Supervised Multi-View Face Reconstruction with Deep Fusion

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Enhanced multi view 3D reconstruction with improved MVSNet

MonoNeuralFusion: Online Monocular Neural 3D Reconstruction with Geometric Priors

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

Single-view 3D Garment Reconstruction using Neural Volumetric Rendering

Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views

3D Reconstruction for Multi-view Objects

FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion

From sketch to reality: precision-friendly 3D generation technology

FA-MSVNet: multi-scale and multi-view feature aggregation methods for stereo 3D reconstruction

A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images

M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction

Deep learning based multi-view stereo matching and 3D scene reconstruction from oblique aerial images