Abstract:Generating 3D meshes from a single image is an important but ill-posed task. Existing methods mainly adopt 2D multiview diffusion models to generate intermediate multiview images, and use the Large Reconstruction Model (LRM) to create the final meshes. However, the multiview images exhibit local inconsistencies, and the meshes often lack fidelity to the input image or look blurry. We propose Fancy123, featuring two enhancement modules and an unprojection operation to address the above three issues, respectively. The appearance enhancement module deforms the 2D multiview images to realign misaligned pixels for better multiview consistency. The fidelity enhancement module deforms the 3D mesh to match the input image. The unprojection of the input image and deformed multiview images onto LRM's generated mesh ensures high clarity, discarding LRM's predicted blurry-looking mesh colors. Extensive qualitative and quantitative experiments verify Fancy123's SoTA performance with significant improvement. Also, the two enhancement modules are plug-and-play and work at inference time, allowing seamless integration into various existing single-image-to-3D methods.

What problem does this paper attempt to address?

This paper attempts to solve several key problems faced when generating high - quality 3D mesh models from a single image. Specifically, these problems include: 1. **Inconsistency between multi - view images**: The multi - view images generated by existing methods often have unaligned pixel positions in local areas, resulting in inconsistency between multi - views. 2. **Low fidelity to the input image**: The generated 3D mesh models usually do not match the input image well, and shape distortion or detail loss may occur. 3. **Color blur**: The generated 3D mesh models often look blurry, mainly due to the inconsistency between multi - view images and the inaccurate color mapping from image to mesh. To address these problems, the paper proposes a new method named Fancy123, whose main features are as follows: - **Appearance enhancement module**: Correct the inconsistency in multi - view images through 2D image deformation, thereby improving the consistency of multi - views. Specifically, this module optimizes a grid - based 2D deformation field so that the deformed multi - view images can match better when projected onto the initial grid. Formula representation: \[ I' = D_{2D}(I, F) \] where \(I'\) and \(F\) represent the deformed image and the deformation field respectively. - **Fidelity enhancement module**: Make the generated 3D mesh closer to the input image through 3D mesh deformation, thereby improving the fidelity. Specifically, this module uses the Jacobian field to parameterize the mesh deformation and optimizes this field to minimize the difference between the mesh rendering result and the input image. Formula representation: \[ V'=\arg\min_V \|LV - \nabla^T AJ\|^2 \] where \(L\) is the Laplacian operator, \(\nabla\) is the gradient operator, \(A\) is the mass matrix, and \(J\) is the Jacobian field. - **Back - projection operation**: Ensure that the finally generated 3D mesh has high definition and good color mapping by back - projecting the input image and the deformed multi - view images onto the initial grid generated by the large - scale reconstruction model (LRM). In summary, Fancy123 significantly improves the quality of generating 3D mesh models from a single image by introducing two enhancement modules (appearance enhancement and fidelity enhancement) and the back - projection operation, and solves the problems of multi - view inconsistency, low fidelity, and color blur in existing methods.

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

Pixel2Mesh++: Multi-View 3D Mesh Generation Via Deformation

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

2L3: Lifting Imperfect Generated 2D Images into Accurate 3D

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

Saliency Guided Subdivision for Single-View Mesh Reconstruction

Wonder3D: Single Image to 3D Using Cross-Domain Diffusion

FlexiDreamer: Single Image-to-3D Generation with FlexiCubes

EasyMesh: an Efficient Method to Reconstruct 3D Mesh from a Single Image

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

Deep Mesh Reconstruction from Single RGB Images via Topology Modification Networks

Multi-view Pixel2Mesh++: 3D reconstruction via Pixel2Mesh with more images

Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

Part123: Part-aware 3D Reconstruction from a Single-view Image