OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design

Zhe Yuan,Jianqi Shi,Yanhong Huang

2024-08-06

Abstract:Computer-aided design (CAD) tools are utilized in the manufacturing industry for modeling everything from cups to spacecraft. These programs are complex to use and typically require years of training and experience to master. Structured and well-constrained 2D sketches and 3D constructions are crucial components of CAD modeling. A well-executed CAD model can be seamlessly integrated into the manufacturing process, thereby enhancing production efficiency. Deep generative models of 3D shapes and 3D object reconstruction models have garnered significant research interest. However, most of these models produce discrete forms of 3D objects that are not editable. Moreover, the few models based on CAD operations often have substantial input restrictions. In this work, we fine-tuned pre-trained models to create OpenECAD models (0.55B, 0.89B, 2.4B and 3.1B), leveraging the visual, logical, coding, and general capabilities of visual language models. OpenECAD models can process images of 3D designs as input and generate highly structured 2D sketches and 3D construction commands, ensuring that the designs are editable. These outputs can be directly used with existing CAD tools' APIs to generate project files. To train our network, we created a series of OpenECAD datasets. These datasets are derived from existing public CAD datasets, adjusted and augmented to meet the specific requirements of vision language model (VLM) training. Additionally, we have introduced an approach that utilizes dependency relationships to define and generate sketches, further enriching the content and functionality of the datasets.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The main problem this paper attempts to address is the complexity and non-editability of existing 3D Computer-Aided Design (CAD) tools. Specifically: 1. **Complexity of existing CAD tools**: While existing CAD tools are powerful, they typically require years of training and experience to master. These tools are complex to operate and have high demands on the user. 2. **Non-editability of generated 3D models**: Most existing 3D shape generation models (such as 3D point clouds, voxelized shapes, polygon meshes, etc.) produce discrete forms of 3D objects, which are usually non-editable. This makes modifications during the actual design and production process difficult. 3. **Input limitations**: Models based on CAD operations often have significant input limitations, such as requiring existing 3D point clouds or detailed hand-drawn information. To overcome these issues, the authors propose the OpenECAD model, which, by fine-tuning pre-trained Visual Language Models (VLMs), can handle 3D design images as input and generate highly structured 2D sketches and 3D construction commands, ensuring the editability of the design. These outputs can be directly used with the APIs of existing CAD tools to generate project files. Additionally, the authors have created a series of OpenECAD datasets, derived from existing public CAD datasets, and adjusted and enhanced to meet the specific requirements of VLM training. In summary, this paper aims to improve the editability and usability of 3D design through the OpenECAD model, lowering the threshold for using CAD tools, and thereby enhancing production efficiency in the manufacturing process.

OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design

CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches

Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry

CAD-LLM: Large Language Model for CAD Generation

Vitruvion: A Generative Model of Parametric CAD Sketches

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

Computer-Aided Design as Language

SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations

Rapid 3D Model Generation with Intuitive 3D Input

GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

eCAD-Net: Editable Parametric CAD Models Reconstruction from Dumb B-Rep Models Using Deep Neural Networks

Generating CAD Code with Vision-Language Models for 3D Designs

SketchGraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

CAD-Estate: Large-scale CAD Model Annotation in RGB Videos

Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs

From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design

CAD Translator: an Effective Drive for Text to 3D Parametric Computer-Aided Design Generative Modeling

How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?

'CADSketchNet' -- An Annotated Sketch dataset for 3D CAD Model Retrieval with Deep Neural Networks