Abstract:Parametric Computer-Aided Design (CAD) is central to contemporary mechanical design. However, it encounters challenges in achieving precise parametric sketch modeling and lacks practical evaluation metrics suitable for mechanical design. We harness the capabilities of pre-trained foundation models, renowned for their successes in natural language processing and computer vision, to develop generative models specifically for CAD. These models are adept at understanding complex geometries and design reasoning, a crucial advancement in CAD technology. In this paper, we propose CadVLM, an end-to-end vision language model for CAD generation. Our approach involves adapting pre-trained foundation models to manipulate engineering sketches effectively, integrating both sketch primitive sequences and sketch images. Extensive experiments demonstrate superior performance on multiple CAD sketch generation tasks such as CAD autocompletion, CAD autoconstraint, and image conditional generation. To our knowledge, this is the first instance of a multimodal Large Language Model (LLM) being successfully applied to parametric CAD generation, representing a pioneering step in the field of computer-aided mechanical design.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address several key issues in parametric Computer-Aided Design (CAD): 1. **Accurate Parametric Sketch Modeling**: - Parametric CAD is crucial in modern mechanical design, but current methods face challenges in achieving accurate parametric sketch modeling. - Traditional CAD generation methods often require manual completion of geometric shapes and constraints, which is both time-consuming and error-prone. 2. **Lack of Suitable Evaluation Metrics for Mechanical Design**: - Current CAD generation models lack effective evaluation metrics, especially in the field of mechanical design. 3. **Utilization of Multimodal Data**: - Existing CAD generation models mainly rely on single-modal data, such as text or images, without fully utilizing the combination of images and text to improve generation quality. ### Solution To address the above issues, the paper proposes **CadVLM** (CAD Vision Language Model), an end-to-end multimodal vision language model specifically designed for CAD generation tasks. Specifically: - **Multimodal Fusion**: CadVLM combines pre-trained vision and language models, capable of processing both text sequences and rendered images of engineering sketches. - **Automatic Generation**: CadVLM can automatically complete partially given sketches and generate the remaining geometric shapes and constraints. - **New Evaluation Metrics**: The paper introduces three new evaluation metrics—Entity Accuracy, Sketch Accuracy, and CAD F1 Score—to quantitatively assess the quality of the generated CAD sketches. ### Main Contributions 1. **Proposing CadVLM**: For the first time, combining vision and text data for CAD generation tasks. 2. **Introducing New Evaluation Metrics**: Providing three new evaluation metrics to quantify the quality of generated CAD sketches. 3. **Superior Performance**: On the SketchGraphs dataset, CadVLM demonstrates excellent performance in CAD auto-completion and auto-constraint tasks, outperforming existing baseline models. Through these innovations, CadVLM offers a new, efficient method for parametric CAD generation, potentially simplifying and accelerating the mechanical design process.

CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches

CAD-LLM: Large Language Model for CAD Generation

Vitruvion: A Generative Model of Parametric CAD Sketches

Generating CAD Code with Vision-Language Models for 3D Designs

Computer-Aided Design as Language

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design

From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design

Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs

How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?

Visually Descriptive Language Model for Vector Graphics Reasoning

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

CAD Translator: an Effective Drive for Text to 3D Parametric Computer-Aided Design Generative Modeling

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

Query2CAD: Generating CAD models using natural language queries

ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models

What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files

CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention

SketchGraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design

Constructing Mechanical Design Agent Based on Large Language Models

GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors