Sketch2CADScript: 3D Scene Reconstruction from 2D Sketch using Visual Transformer and Rhino Grasshopper

Hong-Bin Yang
DOI: https://doi.org/10.48550/arXiv.2309.16850
2023-09-29
Abstract:Existing 3D model reconstruction methods typically produce outputs in the form of voxels, point clouds, or meshes. However, each of these approaches has its limitations and may not be suitable for every scenario. For instance, the resulting model may exhibit a rough surface and distorted structure, making manual editing and post-processing challenging for humans. In this paper, we introduce a novel 3D reconstruction method designed to address these issues. We trained a visual transformer to predict a "scene descriptor" from a single wire-frame image. This descriptor encompasses crucial information, including object types and parameters such as position, rotation, and size. With the predicted parameters, a 3D scene can be reconstructed using 3D modeling software like Blender or Rhino Grasshopper which provides a programmable interface, resulting in finely and easily editable 3D models. To evaluate the proposed model, we created two datasets: one featuring simple scenes and another with complex scenes. The test results demonstrate the model's ability to accurately reconstruct simple scenes but reveal its challenges with more complex ones.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: reconstructing 3D scenes from 2D hand - drawn sketches and seamlessly integrating the generated 3D models into conventional 3D modeling software, so as to facilitate efficient and easy - to - edit design work for architects and designers. Specifically, the existing 3D model reconstruction methods (such as voxels, point clouds or meshes) have the following problems: - The reconstructed model has a rough surface and distorted structure, resulting in difficulties in manual editing and post - processing. - These methods are not suitable for architectural structures, because architectural structures are usually combinations of simple geometric shapes (such as rectangular boxes or pyramids), and the existing methods cannot accurately represent these shapes. - The models generated by the existing 3D reconstruction methods are difficult to be modified and optimized during the design process. To solve these problems, the paper proposes a new method based on Visual Transformer to achieve 3D scene reconstruction through the following steps: 1. **Input image**: Use a single wireframe diagram as input. 2. **Predict scene descriptors**: Train a Visual Transformer to predict "scene descriptors", which contain the types of all objects in the scene and their parameters (position, rotation, size, etc.). 3. **Generate 3D model**: Use 3D modeling software such as Rhino Grasshopper to read the predicted parameters and construct 3D scenes. In addition, in order to evaluate the effect of this model, the author created two datasets: - **Simple scene dataset**: Contains simple 3D scenes and is used for preliminary verification of the effectiveness of the model. - **Complex scene dataset**: Contains complex 3D scenes and is used to test the performance of the model in more complex situations. The experimental results show that the model has a good reconstruction effect in simple scenes, but encounters challenges in complex scenes. This indicates that this method has high accuracy and practicality when dealing with simple scenes, but still needs further improvement when facing complex scenes.