Abstract:Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: <a class="link-external link-https" href="https://sudo-ai-3d.github.io/One2345plus_page" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of generating high - quality 3D objects from a single image, especially in terms of generation speed and high fidelity to the input image. Specifically: 1. **The contradiction between fast generation and high fidelity**: Most of the existing models cannot provide both fast generation speed and high fidelity to the input image simultaneously. This is crucial for practical applications (such as virtual reality, augmented reality, game development, etc.). 2. **The limitation of 3D data**: Due to the scarcity of 3D training data, many existing methods rely on 3D shape datasets for training, which makes it difficult for them to generalize to unseen categories in open - world scenarios. 3. **Excessive optimization time**: Some optimization - based methods (such as DreamFusion and Magic3D) can generate high - quality 3D objects, but it may take tens of minutes or even hours to generate a single shape, and they are prone to problems such as polyhedron problems and excessive color saturation. 4. **Multi - view consistency problem**: Previous works (such as One - 2 - 3 - 45) attempted to use 2D diffusion models to generate multi - view images and perform 3D reconstruction, but these generated multi - view images often lack consistency, resulting in poor 3D reconstruction results. To solve these problems, the paper proposes the **One - 2 - 3 - 45++** method, which achieves efficient and high - quality 3D object generation through the following steps: - **Consistent multi - view generation**: First, generate consistent multi - view images by fine - tuning the 2D diffusion model. - **3D diffusion under multi - view conditions**: Then, use the 3D diffusion model under multi - view conditions to promote these images to 3D texture meshes. - **Lightweight texture optimization**: Finally, further improve the texture quality of the generated 3D meshes through lightweight optimization techniques. This method can generate high - fidelity 3D texture meshes from a single RGB image in about one minute, and shows higher robustness and visual quality in experimental evaluations.

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Wonder3D: Single Image to 3D Using Cross-Domain Diffusion

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

Envision3D: One Image to 3D with Anchor Views Interpolation

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation

Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation

Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation

Generic 3D Diffusion Adapter Using Controlled Multi-View Editing