Magic3DSketch: Create Colorful 3D Models From Sketch-Based 3D Modeling Guided by Text and Language-Image Pre-Training

Ying Zang,Yidong Han,Chaotao Ding,Jianqi Zhang,Tianrun Chen
2024-07-27
Abstract:The requirement for 3D content is growing as AR/VR application emerges. At the same time, 3D modelling is only available for skillful experts, because traditional methods like Computer-Aided Design (CAD) are often too labor-intensive and skill-demanding, making it challenging for novice users. Our proposed method, Magic3DSketch, employs a novel technique that encodes sketches to predict a 3D mesh, guided by text descriptions and leveraging external prior knowledge obtained through text and language-image pre-training. The integration of language-image pre-trained neural networks complements the sparse and ambiguous nature of single-view sketch inputs. Our method is also more useful and offers higher degree of controllability compared to existing text-to-3D approaches, according to our user study. Moreover, Magic3DSketch achieves state-of-the-art performance in both synthetic and real dataset with the capability of producing more detailed structures and realistic shapes with the help of text input. Users are also more satisfied with models obtained by Magic3DSketch according to our user study. Additionally, we are also the first, to our knowledge, add color based on text description to the sketch-derived shapes. By combining sketches and text guidance with the help of language-image pretrained models, our Magic3DSketch can allow novice users to create custom 3D models with minimal effort and maximum creative freedom, with the potential to revolutionize future 3D modeling pipelines.
Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
The paper attempts to address the problem of simplifying the 3D modeling process, making it easy for non-professional users to create high-quality 3D models. Specifically: - **Current Problem**: Current 3D modeling methods (such as CAD software) require a high level of professional skill and long learning periods, which pose a significant barrier for novice users. - **Solution**: A new method called Magic3DSketch is proposed, which combines hand-drawn sketches and textual descriptions to generate high-fidelity 3D models. By using language-image pre-trained models (such as CLIP), this method can predict detailed 3D meshes from single-view sketches and also add color information. - **Advantages**: Compared to existing text-based 3D modeling methods, Magic3DSketch offers higher controllability and practicality. Experimental results show that this method achieves state-of-the-art performance on both synthetic and real datasets, and it generates models faster (over 100 frames per second). Additionally, users have higher satisfaction with the 3D models generated by Magic3DSketch. In summary, this paper aims to lower the technical threshold of 3D modeling by combining sketch and text inputs and leveraging the advantages of pre-trained models, enabling more people to easily create complex 3D models.