PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

Minghao Chen,Roman Shapovalov,Iro Laina,Tom Monnier,Jianyuan Wang,David Novotny,Andrea Vedaldi
2024-12-25
Abstract:Text- or image-to-3D generators and 3D scanners can now produce 3D assets with high-quality shapes and textures. These assets typically consist of a single, fused representation, like an implicit neural field, a Gaussian mixture, or a mesh, without any useful structure. However, most applications and creative workflows require assets to be made of several meaningful parts that can be manipulated independently. To address this gap, we introduce PartGen, a novel approach that generates 3D objects composed of meaningful parts starting from text, an image, or an unstructured 3D object. First, given multiple views of a 3D object, generated or rendered, a multi-view diffusion model extracts a set of plausible and view-consistent part segmentations, dividing the object into parts. Then, a second multi-view diffusion model takes each part separately, fills in the occlusions, and uses those completed views for 3D reconstruction by feeding them to a 3D reconstruction network. This completion process considers the context of the entire object to ensure that the parts integrate cohesively. The generative completion model can make up for the information missing due to occlusions; in extreme cases, it can hallucinate entirely invisible parts based on the input 3D asset. We evaluate our method on generated and real 3D assets and show that it outperforms segmentation and part-extraction baselines by a large margin. We also showcase downstream applications such as 3D part editing.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of generating 3D objects with meaningful parts from text, images, or existing unstructured 3D objects. Specifically, it attempts to弥补 a key shortcoming of current 3D generation methods: although these methods can generate high - quality shapes and textures, the generated 3D assets are usually single, fused representations (such as implicit neural fields, Gaussian mixtures, or meshes) and lack useful structures. Most applications and creative workflows require 3D assets to be composed of multiple independently operable meaningful parts. **Main problems include: ** 1. **How to automatically segment 3D objects into meaningful parts?** - Different artists may decompose the same object in different ways, so there is no "gold standard" segmentation method. The segmentation method should be able to model multiple possible segmentation schemes rather than a single segmentation result. 2. **How to extract high - quality and complete 3D parts when they are partially or completely invisible?** - Current 3D reconstruction and generation methods usually only model the visible outer surface of the object, ignoring the internal or occluded parts. Therefore, when decomposing an object into parts, it is often necessary to complete these parts or even completely generate the invisible parts. ### Overview of the solution To address the above challenges, the authors propose **PartGen**, a method based on multi - view diffusion models for generating 3D objects with meaningful parts. PartGen achieves this goal through the following steps: 1. **Multi - view segmentation**: - Use a multi - view diffusion model to segment 3D objects from multiple perspectives and identify reasonable parts. This model generates multiple possible segmentation schemes, capturing the intentions of different artists. 2. **Context - aware part completion**: - For each segmented part, use another multi - view diffusion model to complete it, considering the context of the entire object to ensure that the parts can be correctly combined. 3. **3D reconstruction**: - Finally, use a pre - trained 3D reconstruction network to convert the completed multi - view images into 3D parts. Through this method, PartGen can not only generate high - quality 3D parts, but also ensure that these parts are semantically meaningful and can be independently edited and manipulated. In addition, PartGen also supports 3D part editing based on text instructions, enhancing flexibility and control. ### Application scenarios The application scenarios of PartGen include but are not limited to: - **3D art creation**: Generate 3D models with editable parts for artists to modify and reuse. - **Video game development**: Dynamically configure character parts, such as changing clothes or equipping weapons. - **Robotics and spatial intelligence**: Understand the structure of 3D objects to support more complex interactions and operations. Through these improvements, PartGen significantly improves the quality and practicality of 3D generation and reconstruction.