One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Minghua Liu,Ruoxi Shi,Linghao Chen,Zhuoyang Zhang,Chao Xu,Xinyue Wei,Hansheng Chen,Chong Zeng,Jiayuan Gu,Hao Su
DOI: https://doi.org/10.48550/arXiv.2311.07885
2023-11-14
Abstract:Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: <a class="link-external link-https" href="https://sudo-ai-3d.github.io/One2345plus_page" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Graphics
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of generating high - quality 3D objects from a single image, especially in terms of generation speed and high fidelity to the input image. Specifically: 1. **The contradiction between fast generation and high fidelity**: Most of the existing models cannot provide both fast generation speed and high fidelity to the input image simultaneously. This is crucial for practical applications (such as virtual reality, augmented reality, game development, etc.). 2. **The limitation of 3D data**: Due to the scarcity of 3D training data, many existing methods rely on 3D shape datasets for training, which makes it difficult for them to generalize to unseen categories in open - world scenarios. 3. **Excessive optimization time**: Some optimization - based methods (such as DreamFusion and Magic3D) can generate high - quality 3D objects, but it may take tens of minutes or even hours to generate a single shape, and they are prone to problems such as polyhedron problems and excessive color saturation. 4. **Multi - view consistency problem**: Previous works (such as One - 2 - 3 - 45) attempted to use 2D diffusion models to generate multi - view images and perform 3D reconstruction, but these generated multi - view images often lack consistency, resulting in poor 3D reconstruction results. To solve these problems, the paper proposes the **One - 2 - 3 - 45++** method, which achieves efficient and high - quality 3D object generation through the following steps: - **Consistent multi - view generation**: First, generate consistent multi - view images by fine - tuning the 2D diffusion model. - **3D diffusion under multi - view conditions**: Then, use the 3D diffusion model under multi - view conditions to promote these images to 3D texture meshes. - **Lightweight texture optimization**: Finally, further improve the texture quality of the generated 3D meshes through lightweight optimization techniques. This method can generate high - fidelity 3D texture meshes from a single RGB image in about one minute, and shows higher robustness and visual quality in experimental evaluations.