Abstract:The creation of manufacturable and editable 3D shapes through Computer-Aided Design (CAD) remains a highly manual and time-consuming task, hampered by the complex topology of boundary representations of 3D solids and unintuitive design tools. This paper introduces GenCAD, a generative model that employs autoregressive transformers and latent diffusion models to transform image inputs into parametric CAD command sequences, resulting in editable 3D shape representations. GenCAD integrates an autoregressive transformer-based architecture with a contrastive learning framework, enhancing the generation of CAD programs from input images and providing a representation learning framework for multiple data modalities relevant to engineering designs. Extensive evaluations demonstrate that GenCAD significantly outperforms existing state-of-the-art methods in terms of the precision and modifiability of generated 3D shapes. Notably, GenCAD shows a marked improvement in the accuracy of 3D shape generation for long sequences, supporting its application in complex design tasks. Additionally, the contrastive embedding feature of GenCAD facilitates the retrieval of CAD models using image queries from databases which is a critical challenge within the CAD community. While most work in the 3D shape generation literature focuses on representations like meshes, voxels, or point clouds, practical engineering applications demand modifiability and the ability for multi-modal conditional generation. Our results provide a significant step forward in this direction, highlighting the potential of generative models to expedite the entire design-to-production pipeline and seamlessly integrate different design modalities.

CAD Translator: an Effective Drive for Text to 3D Parametric Computer-Aided Design Generative Modeling

Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

Text2CAD: Text to 3D CAD Generation via Technical Drawings

GenCAD: Image-Conditioned Computer-Aided Design Generation with Transformer-Based Contrastive Representation and Diffusion Priors

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

Computer-Aided Design as Language

CAD-LLM: Large Language Model for CAD Generation

Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds Via Multimodal Diffusion

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches

TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds

DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Twins-Mix: Self Mixing in Latent Space for Reasonable Data Augmentation of 3D Computer-Aided Design Generative Modeling

OpenECAD: An Efficient Visual Language Model for Editable 3D-CAD Design

ContrastCAD: Contrastive Learning-Based Representation Learning for Computer-Aided Design Models

Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

Self-supervised Graph Neural Network for Mechanical CAD Retrieval

Generating CAD Code with Vision-Language Models for 3D Designs