Text-Free Controllable 3-D Point Cloud Generation
Haihong Xiao,Wenxiong Kang,Yuqiong Li,Hongbin Xu
DOI: https://doi.org/10.1109/tim.2024.3353839
IF: 5.6
2024-02-10
IEEE Transactions on Instrumentation and Measurement
Abstract:Generating 3-D shapes with text inputs has long been a peculiar challenge in computer vision, which requires methodological know-how as well as a sense of art. Recently, text-to-image generation has driven remarkable progress, raising tremendous interest in text-guided shape generation, which further paves the way for industrial design. Nevertheless, prior efforts on text-guided 3-D synthesis either lack geometric details, are limited by the simple text input, or need expensive optimization and additional postprocessing, which make them unfriendly for novices. In this research, we present TFCNet, a novel approach for text-free controllable point cloud generation. In the training phase, we first design an empirically robust cross-modal skeletal point generator (CM-SPG) to predict skeletal points of the specific shape conditioned on the single image input. Then, we develop a diffusion-based dense point generator, which takes skeletal points as geometric guidance to produce dense point clouds that are faithful to the input images. In the inference phase, we propose an efficient text-free nonparametric transfer regime, which does not require separate training and can directly generate point cloud shapes while being semantically faithful to the provided text input. As evidenced by our experiments on the ShapeNet(v2) and CO3D datasets, our proposed method outperforms existing state of-the-art methods both quantitatively and qualitatively.
engineering, electrical & electronic,instruments & instrumentation