Abstract:The task of novel view synthesis aims to generate unseen perspectives of an object or scene from a limited set of input images. Nevertheless, synthesizing novel views from a single image still remains a significant challenge in the realm of computer vision. Previous approaches tackle this problem by adopting mesh prediction, multi-plain image construction, or more advanced techniques such as neural radiance fields. Recently, a pre-trained diffusion model that is specifically designed for 2D image synthesis has demonstrated its capability in producing photorealistic novel views, if sufficiently optimized on a 3D finetuning task. Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs. To tackle this issue, we propose Efficient-3DiM, a simple but effective framework to learn a single-image novel-view synthesizer. Motivated by our in-depth analysis of the inference process of diffusion models, we propose several pragmatic strategies to reduce the training overhead to a manageable scale, including a crafted timestep sampling strategy, a superior 3D feature extractor, and an enhanced training scheme. When combined, our framework is able to reduce the total training time from 10 days to less than 1 day, significantly accelerating the training process under the same computational platform (one instance with 8 Nvidia A100 GPUs). Comprehensive experiments are conducted to demonstrate the efficiency and generalizability of our proposed method.

ReE3D: Boosting Novel View Synthesis for Monocular Images Using Residual Encoders

3D-Aware Image Synthesis Via Learning Structural and Textural Representations

Virtual View Generation Based on 3D-Dense-attentive GAN Networks

View Independent Generative Adversarial Network for Novel View Synthesis

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images

Make Encoder Great Again in 3D GAN Inversion through Geometry and Occlusion-Aware Encoding

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

TriPlaneNet: An Encoder for EG3D Inversion

Novel View Synthesis from only a 6-DoF Camera Pose by Two-stage Networks

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day

Dual Encoder GAN Inversion for High-Fidelity 3D Head Reconstruction from Single Images

Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image

3D GAN Inversion with Pose Optimization

Generative Novel View Synthesis with 3D-Aware Diffusion Models

Meta-Auxiliary Network for 3D GAN Inversion

CompNVS: Novel View Synthesis with Scene Completion

Novel View Synthesis with Pixel-Space Diffusion Models

Generative View Synthesis: From Single-view Semantics to Novel-view Images

Bridging Implicit and Explicit Geometric Transformation for Single-Image View Synthesis