Abstract:Existing 3D model reconstruction methods typically produce outputs in the form of voxels, point clouds, or meshes. However, each of these approaches has its limitations and may not be suitable for every scenario. For instance, the resulting model may exhibit a rough surface and distorted structure, making manual editing and post-processing challenging for humans. In this paper, we introduce a novel 3D reconstruction method designed to address these issues. We trained a visual transformer to predict a "scene descriptor" from a single wire-frame image. This descriptor encompasses crucial information, including object types and parameters such as position, rotation, and size. With the predicted parameters, a 3D scene can be reconstructed using 3D modeling software like Blender or Rhino Grasshopper which provides a programmable interface, resulting in finely and easily editable 3D models. To evaluate the proposed model, we created two datasets: one featuring simple scenes and another with complex scenes. The test results demonstrate the model's ability to accurately reconstruct simple scenes but reveal its challenges with more complex ones.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: reconstructing 3D scenes from 2D hand - drawn sketches and seamlessly integrating the generated 3D models into conventional 3D modeling software, so as to facilitate efficient and easy - to - edit design work for architects and designers. Specifically, the existing 3D model reconstruction methods (such as voxels, point clouds or meshes) have the following problems: - The reconstructed model has a rough surface and distorted structure, resulting in difficulties in manual editing and post - processing. - These methods are not suitable for architectural structures, because architectural structures are usually combinations of simple geometric shapes (such as rectangular boxes or pyramids), and the existing methods cannot accurately represent these shapes. - The models generated by the existing 3D reconstruction methods are difficult to be modified and optimized during the design process. To solve these problems, the paper proposes a new method based on Visual Transformer to achieve 3D scene reconstruction through the following steps: 1. **Input image**: Use a single wireframe diagram as input. 2. **Predict scene descriptors**: Train a Visual Transformer to predict "scene descriptors", which contain the types of all objects in the scene and their parameters (position, rotation, size, etc.). 3. **Generate 3D model**: Use 3D modeling software such as Rhino Grasshopper to read the predicted parameters and construct 3D scenes. In addition, in order to evaluate the effect of this model, the author created two datasets: - **Simple scene dataset**: Contains simple 3D scenes and is used for preliminary verification of the effectiveness of the model. - **Complex scene dataset**: Contains complex 3D scenes and is used to test the performance of the model in more complex situations. The experimental results show that the model has a good reconstruction effect in simple scenes, but encounters challenges in complex scenes. This indicates that this method has high accuracy and practicality when dealing with simple scenes, but still needs further improvement when facing complex scenes.

Sketch2CADScript: 3D Scene Reconstruction from 2D Sketch using Visual Transformer and Rhino Grasshopper

Model-driven sketch reconstruction with structure-oriented retrieval

Sketchformer++: A Hierarchical Transformer Architecture for Vector Sketch Representation

Sketch2Scene: sketch-based co-retrieval and co-placement of 3D models

Embedding Visual Cognition in 3D Reconstruction from Multi-View Engineering Drawings

From sketch to reality: precision-friendly 3D generation technology

Visual Odometry Based 3D-Reconstruction

Reality3DSketch: Rapid 3D Modeling of Objects from Single Freehand Sketches

High-Fidelity 3D Model Generation with Relightable Appearance from Single Freehand Sketches and Text Guidance

Deep3DSketch+\+: High-Fidelity 3D Modeling from Single Free-hand Sketches

Rapid 3D Model Generation with Intuitive 3D Input

SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

Sketching Reality

Sketch-based Reconstruction of Symmetric 3D Free-Form Objects.

Sketch2CAD: Sequential CAD Modeling by Sketching in Context

Sketch2Mesh: Reconstructing and Editing 3D Shapes from Sketches

Scene Reconstruction with Functional Objects for Robot Autonomy

Magic3DSketch: Create Colorful 3D Models From Sketch-Based 3D Modeling Guided by Text and Language-Image Pre-Training

Deep3DSketch+: Rapid 3D Modeling from Single Free-hand Sketches

Parametric Primitive Analysis of CAD Sketches with Vision Transformer

Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches.