Abstract:In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To address this issue, we propose a synchronized multiview diffusion model that models the joint probability distribution of multiview images, enabling the generation of multiview-consistent images in a single reverse process. SyncDreamer synchronizes the intermediate states of all the generated images at every step of the reverse process through a 3D-aware feature attention mechanism that correlates the corresponding features across different views. Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to-3D.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: generating multi - view - consistent images from single - view images, that is, given a single - view image of an object, generate images from multiple views and ensure that these images maintain consistency in geometry and color. This problem is of great significance in the fields of computer vision and graphics, because existing methods have difficulty maintaining the consistency of geometric structures and colors when generating multi - view images. ### Problem Background 1. **Limited 3D Information**: Although neural networks have made great progress in extracting 3D information from images (such as Yao et al., 2018; Tewari et al., 2020), generating multi - view - consistent images from single - view images remains a challenge because the 3D information in the image is very limited. 2. **Successes and Limitations of Diffusion Models**: Diffusion models (such as Rombach et al., 2022; Ho et al., 2020) have achieved great success in 2D image generation, but directly training general - purpose 3D diffusion models usually requires a large amount of 3D data, and the existing 3D datasets are not sufficient to capture the complexity of arbitrary 3D shapes. 3. **Shortcomings of Existing Methods**: Some methods generate 3D models by distilling pre - trained text - to - image diffusion models, but this requires text inversion (Gal et al., 2022), and it takes a long time to generate a single shape and the parameter adjustment is cumbersome. In addition, it is difficult to represent the details of an image (such as category, appearance, pose) using a single word embedding, resulting in a decline in the quality of 3D shape reconstruction. ### Solutions Proposed in the Paper To overcome the above problems, the paper proposes a new framework named SyncDreamer, which aims to generate multi - view - consistent images from single - view images. Specifically: - **Synchronous Multi - view Diffusion Model**: SyncDreamer ensures that the generated multi - view images are geometrically and color - consistent by introducing a synchronization mechanism to synchronize the intermediate states of all generated images during the reverse diffusion process. - **3D - Aware Feature Attention Mechanism**: By applying a 3D - aware feature attention mechanism in each denoising step, SyncDreamer can correlate the corresponding features between different views, thereby improving multi - view consistency. - **Efficient 3D Reconstruction**: The generated multi - view - consistent images can be directly used for 3D reconstruction methods such as NeRF or NeuS without using special loss functions, simplifying the 3D reconstruction process. ### Experimental Results Experiments show that SyncDreamer outperforms the baseline methods on the Google Scanned Object dataset, can generate more consistent images and reconstruct better 3D shapes. In addition, SyncDreamer also supports multiple - style 2D inputs (such as cartoons, sketches, ink - wash paintings, oil paintings), verifying its effectiveness in promoting 2D images to 3D. In summary, SyncDreamer solves the problem of generating multi - view - consistent images from single - view images by introducing the synchronous multi - view diffusion model and the 3D - aware feature attention mechanism, providing a new and effective tool for 3D reconstruction tasks.

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

FusionDreamer: Consistent Images Generation from Sparse-view Images

MVDream: Multi-view Diffusion for 3D Generation

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

ViewFusion: Towards Multi-View Consistency via Interpolated Denoising

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models

MultiDiff: Consistent Novel View Synthesis from a Single Image

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

2L3: Lifting Imperfect Generated 2D Images into Accurate 3D

Envision3D: One Image to 3D with Anchor Views Interpolation

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation

Consistent123: Improve Consistency for One Image to 3D Object Synthesis

MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View