CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches

Sifan Wu,Amir Khasahmadi,Mor Katz,Pradeep Kumar Jayaraman,Yewen Pu,Karl Willis,Bang Liu
2024-09-26
Abstract:Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address several key issues in parametric Computer-Aided Design (CAD): 1. **Accurate Parametric Sketch Modeling**: - Parametric CAD is crucial in modern mechanical design, but current methods face challenges in achieving accurate parametric sketch modeling. - Traditional CAD generation methods often require manual completion of geometric shapes and constraints, which is both time-consuming and error-prone. 2. **Lack of Suitable Evaluation Metrics for Mechanical Design**: - Current CAD generation models lack effective evaluation metrics, especially in the field of mechanical design. 3. **Utilization of Multimodal Data**: - Existing CAD generation models mainly rely on single-modal data, such as text or images, without fully utilizing the combination of images and text to improve generation quality. ### Solution To address the above issues, the paper proposes **CadVLM** (CAD Vision Language Model), an end-to-end multimodal vision language model specifically designed for CAD generation tasks. Specifically: - **Multimodal Fusion**: CadVLM combines pre-trained vision and language models, capable of processing both text sequences and rendered images of engineering sketches. - **Automatic Generation**: CadVLM can automatically complete partially given sketches and generate the remaining geometric shapes and constraints. - **New Evaluation Metrics**: The paper introduces three new evaluation metrics—Entity Accuracy, Sketch Accuracy, and CAD F1 Score—to quantitatively assess the quality of the generated CAD sketches. ### Main Contributions 1. **Proposing CadVLM**: For the first time, combining vision and text data for CAD generation tasks. 2. **Introducing New Evaluation Metrics**: Providing three new evaluation metrics to quantify the quality of generated CAD sketches. 3. **Superior Performance**: On the SketchGraphs dataset, CadVLM demonstrates excellent performance in CAD auto-completion and auto-constraint tasks, outperforming existing baseline models. Through these innovations, CadVLM offers a new, efficient method for parametric CAD generation, potentially simplifying and accelerating the mechanical design process.