Abstract:Currently, object detection applications in construction are almost based on pure 2D data (both image and annotation are 2D-based), resulting in the developed artificial intelligence (AI) applications only applicable to some scenarios that only require 2D information. However, most advanced applications usually require AI agents to perceive 3D spatial information, which limits the further development of the current computer vision (CV) in construction. The lack of 3D annotated datasets for construction object detection worsens the situation. Therefore, this study creates and releases a virtual dataset with 3D annotations named VCVW-3D, which covers 15 construction scenes and involves ten categories of construction vehicles and workers. The VCVW-3D dataset is characterized by multi-scene, multi-category, multi-randomness, multi-viewpoint, multi-annotation, and binocular vision. Several typical 2D and monocular 3D object detection models are then trained and evaluated on the VCVW-3D dataset to provide a benchmark for subsequent research. The VCVW-3D is expected to bring considerable economic benefits and practical significance by reducing the costs of data construction, prototype development, and exploration of space-awareness applications, thus promoting the development of CV in construction, especially those of 3D applications.

What problem does this paper attempt to address?

This paper is primarily dedicated to addressing the limitations of computer vision (CV) technology in the construction industry concerning 3D spatial perception. Specifically, most current object detection applications based on purely 2D data cannot meet the needs of advanced management activities that require 3D spatial information. To solve this problem, the authors created a virtual 3D dataset named VCVW-3D. ### Research Background and Objectives - **Background**: Currently, object detection applications in the construction field are mostly based on 2D images and annotated data, resulting in AI models lacking the ability to perceive 3D space. This limits the application of these models in scenarios requiring spatial information, such as judging high-altitude or edge operations, and the spatial distance between large construction vehicles and workers. - **Objective**: To construct a virtual dataset containing 3D annotation information, VCVW-3D, to promote the development of computer vision in the construction field, especially for 3D applications. ### Solution 1. **Dataset Features**: - Multi-scene (15 indoor and outdoor scenes) - Multi-category (involving 10 types of construction vehicles and workers) - Multi-randomness (random changes in the number, position, angle, and color of objects) - Multi-view - Multi-annotation (2D/3D bounding boxes, 2D semantic/instance segmentation, depth maps) - Stereo vision 2. **Application Scenarios**: - The dataset covers various construction activities, such as concrete pouring, rebar tying, road paving, and earth excavation. 3. **Contributions**: - Significantly reduces the difficulty of 3D data collection and annotation by generating and annotating data in a controlled and customizable manner. - Provides benchmarks for 2D and monocular 3D object detection, offering references for subsequent research. - Helps reduce data construction costs and promotes the exploration of more 3D CV research, especially those requiring 3D object detection applications. In summary, this paper aims to promote the application and development of computer vision technology in the construction industry by creating a virtual dataset rich in 3D information, particularly enhancing the capability of 3D spatial perception.

VCVW-3D: A Virtual Construction Vehicles and Workers Dataset with 3D Annotations

VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation

VEnvision3D: A Synthetic Perception Dataset for 3D Multi-Task Model Research

Developing a Comprehensive 3D Point Cloud Dataset for Construction Projects

Construction Instance Segmentation (CIS) Dataset for Deep Learning-Based Computer Vision

Computer Vision for Construction Progress Monitoring: A Real-Time Object Detection Approach

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

V3Det: Vast Vocabulary Visual Detection Dataset

Construction Scene Parsing (CSP): Structured Annotations of Image Segmentation for Construction Semantic Understanding

Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data

ETHcavation: A Dataset and Pipeline for Panoptic Scene Understanding and Object Tracking in Dynamic Construction Environments

Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds

FSVVD: A Dataset of Full Scene Volumetric Video

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

Deep semantic segmentation for visual understanding on construction sites

The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research

Comprehensive Visual Question Answering on Point Clouds through Compositional Scene Manipulation