Zero3D: Semantic-Driven 3D Shape Generation for Zero-Shot Learning.

Bo Han,Yixuan Shen,Yitong Fu
DOI: https://doi.org/10.1007/978-3-031-50072-5_33
2024-01-01
Abstract:Semantic-driven 3D shape generation aims to generate 3D shapes conditioned on textual input. However, previous approaches have faced challenges with the single-category generation, low-frequency details, and the requirement for large quantities of paired data. To address these issues, we propose a multi-category diffusion model. Specifically, our approach includes the following components: 1) To mitigate the problem of limited large-scale paired data, we establish a connection between text, 2D images, and 3D shapes through the use of the pre-trained CLIP model, enabling zero-shot learning. 2) To obtain the multi-category 3D shape feature, we employ a conditional flow model to generate a multi-category shape vector conditioned on the CLIP embedding. 3) To generate multi-category 3D shapes, we utilize a hidden-layer diffusion model conditioned on the multi-category shape vector, resulting in significant reductions in training time and memory consumption. We evaluate the generated results of our framework and demonstrate that our method outperforms existing methods. The code and more qualitative samples can be found at website .
What problem does this paper attempt to address?