Abstract:Text- or image-to-3D generators and 3D scanners can now produce 3D assets with high-quality shapes and textures. These assets typically consist of a single, fused representation, like an implicit neural field, a Gaussian mixture, or a mesh, without any useful structure. However, most applications and creative workflows require assets to be made of several meaningful parts that can be manipulated independently. To address this gap, we introduce PartGen, a novel approach that generates 3D objects composed of meaningful parts starting from text, an image, or an unstructured 3D object. First, given multiple views of a 3D object, generated or rendered, a multi-view diffusion model extracts a set of plausible and view-consistent part segmentations, dividing the object into parts. Then, a second multi-view diffusion model takes each part separately, fills in the occlusions, and uses those completed views for 3D reconstruction by feeding them to a 3D reconstruction network. This completion process considers the context of the entire object to ensure that the parts integrate cohesively. The generative completion model can make up for the information missing due to occlusions; in extreme cases, it can hallucinate entirely invisible parts based on the input 3D asset. We evaluate our method on generated and real 3D assets and show that it outperforms segmentation and part-extraction baselines by a large margin. We also showcase downstream applications such as 3D part editing.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of generating 3D objects with meaningful parts from text, images, or existing unstructured 3D objects. Specifically, it attempts to弥补 a key shortcoming of current 3D generation methods: although these methods can generate high - quality shapes and textures, the generated 3D assets are usually single, fused representations (such as implicit neural fields, Gaussian mixtures, or meshes) and lack useful structures. Most applications and creative workflows require 3D assets to be composed of multiple independently operable meaningful parts. **Main problems include: ** 1. **How to automatically segment 3D objects into meaningful parts?** - Different artists may decompose the same object in different ways, so there is no "gold standard" segmentation method. The segmentation method should be able to model multiple possible segmentation schemes rather than a single segmentation result. 2. **How to extract high - quality and complete 3D parts when they are partially or completely invisible?** - Current 3D reconstruction and generation methods usually only model the visible outer surface of the object, ignoring the internal or occluded parts. Therefore, when decomposing an object into parts, it is often necessary to complete these parts or even completely generate the invisible parts. ### Overview of the solution To address the above challenges, the authors propose **PartGen**, a method based on multi - view diffusion models for generating 3D objects with meaningful parts. PartGen achieves this goal through the following steps: 1. **Multi - view segmentation**: - Use a multi - view diffusion model to segment 3D objects from multiple perspectives and identify reasonable parts. This model generates multiple possible segmentation schemes, capturing the intentions of different artists. 2. **Context - aware part completion**: - For each segmented part, use another multi - view diffusion model to complete it, considering the context of the entire object to ensure that the parts can be correctly combined. 3. **3D reconstruction**: - Finally, use a pre - trained 3D reconstruction network to convert the completed multi - view images into 3D parts. Through this method, PartGen can not only generate high - quality 3D parts, but also ensure that these parts are semantically meaningful and can be independently edited and manipulated. In addition, PartGen also supports 3D part editing based on text instructions, enhancing flexibility and control. ### Application scenarios The application scenarios of PartGen include but are not limited to: - **3D art creation**: Generate 3D models with editable parts for artists to modify and reuse. - **Video game development**: Dynamically configure character parts, such as changing clothes or equipping weapons. - **Robotics and spatial intelligence**: Understand the structure of 3D objects to support more complex interactions and operations. Through these improvements, PartGen significantly improves the quality and practicality of 3D generation and reconstruction.

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

Part123: Part-aware 3D Reconstruction from a Single-view Image

IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

PhysPart: Physically Plausible Part Completion for Interactable Objects

ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

Gen-3Diffusion: Realistic Image-to-3D Generation via 2D & 3D Diffusion Synergy

DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion

DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models