DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose

Yusuke Yoshiyasu,Leyuan Sun

2024-08-27

Abstract:This paper presents DiffSurf, a transformer-based denoising diffusion model for generating and reconstructing 3D surfaces. Specifically, we design a diffusion transformer architecture that predicts noise from noisy 3D surface vertices and normals. With this architecture, DiffSurf is able to generate 3D surfaces in various poses and shapes, such as human bodies, hands, animals and man-made objects. Further, DiffSurf is versatile in that it can address various 3D downstream tasks including morphing, body shape variation and 3D human mesh fitting to 2D keypoints. Experimental results on 3D human model benchmarks demonstrate that DiffSurf can generate shapes with greater diversity and higher quality than previous generative models. Furthermore, when applied to the task of single-image 3D human mesh recovery, DiffSurf achieves accuracy comparable to prior techniques at a near real-time rate.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Generating diverse poses and shapes of 3D surfaces**: Existing diffusion models lack the ability to maintain point-to-point correspondence when generating different poses of 3D human or animal bodies, leading to insufficient quality and diversity in the generated results. DiffSurf ensures point-to-point correspondence between different shapes by introducing a vertex-based mesh recovery paradigm. 2. **Handling various object types**: Researchers want the model to handle a wide range of object types, such as human bodies, mammals, and artificial objects. DiffSurf is designed with a general network architecture that can generate and reconstruct 3D surfaces in various shapes and poses. 3. **Multifunctional framework**: Researchers need a framework capable of handling multiple tasks (e.g., interpolating between two shapes, changing poses, and manipulating shapes). DiffSurf provides a method based on pre-trained models that can effectively utilize these models to solve various 3D processing tasks. By proposing DiffSurf—a Transformer-based denoising diffusion model, the paper addresses the above challenges, achieving high-quality and diverse 3D surface generation, and performs well in different downstream tasks.

DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

Cortical Surface Diffusion Generative Models

$α$Surf: Implicit Surface Reconstruction for Semi-Transparent and Thin Objects with Decoupled Geometry and Opacity

TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy

Wonder3D: Single Image to 3D Using Cross-Domain Diffusion

GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision

PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models

Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

Single-Image 3D Human Digitization with Shape-Guided Diffusion

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Sur2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction