Abstract:Sensing and reality capture devices are widely used in construction sites. Among different technologies, vision-based sensors are by far the most common and ubiquitous. A large volume of images and videos is collected from construction projects every day to track work progress, measure productivity, litigate claims, and monitor safety compliance. Manual interpretation of such colossal amounts of data, however, is non-trivial, error-prone, and resource-intensive. This has motivated new research on soft computing methods that utilize high-power data processing, computer vision, and deep learning (DL) in the form of convolutional neural networks (CNNs). A fundamental step toward machine-driven interpretation of construction site scenery is to accurately identify objects of interest for a particular problem. The accuracy requirement, however, may offset the computational speed of the candidate method. While lightweight DL algorithms (e.g., Mask R-CNN) can perform visual recognition with relatively high accuracy, they suffer from low processing efficacy, which hinders their use in real-time decision-making. One of the most promising DL algorithms that balance speed and accuracy is YOLO (you-only-look-once). This paper investigates YOLO-based CNN models in fast detection of construction objects. First, a large-scale image dataset, named Pictor-v2, is created, which contains about 3,500 images and approximately 11,500 instances of common construction site objects (e.g., building, equipment, worker). To assess the agility of object detection, transfer learning is used to train two variations of this model, namely, YOLO-v2 and YOLO-v3, and test them on different data combinations (crowdsourced, web-mined, or both). Results indicate that performance is higher if the model is trained on both crowdsourced and web-mined images. Additionally, YOLO-v3 outperforms YOLO-v2 by focusing on smaller, harder-to-detect objects. The best-performing YOLO-v3 model has a 78.2% mAP when tested on crowdsourced data. Sensitivity analysis of the output shows that the model's strong suit is in detecting larger objects in less crowded and well-lit spaces. The proposed methodology can also be extended to predict the relative distance of the detected objects with reliable accuracy. Findings of this work lay the foundation for further research on technology-assistive systems to augment human capacities in quickly and reliably interpreting visual data in complex environments.

Multispectral Object Detection with Deep Learning

A Lightweight SE-YOLOv3 Network for Multi-Scale Object Detection in Remote Sensing Imagery.

Multispectral Object Detection Based on Multilevel Feature Fusion and Dual Feature Modulation

Object Detection in Multispectral Remote Sensing Images Based on Cross-Modal Cross-Attention

GEM: Glare or Gloom, I Can Still See You -- End-to-End Multimodal Object Detection

Multiscale Domain Adaptive YOLO for Cross-Domain Object Detection

Hyperspectral Image Target Recognition Based on YOLO Model

Multispectral Deep Neural Network Fusion Method for Low-Light Object Detection

Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

MA-YOLO: a multi-attention object detection network for remote sensing images

Real Time Multi-Class Object Detection and Recognition Using Vision Augmentation Algorithm

Surveying You Only Look Once (YOLO) Multispectral Object Detection Advancements, Applications And Challenges

Deep Domain Adaptation Based Multi-Spectral Salient Object Detection

Multiscale and Direction Target Detecting in Remote Sensing Images via Modified YOLO-v4

Deep learning based object detection from multi-modal sensors: an overview

Deep Convolutional Networks for Construction Object Detection Under Different Visual Conditions

MRD-YOLO: A Multispectral Object Detection Algorithm for Complex Road Scenes

Dual-YOLO Architecture from Infrared and Visible Images for Object Detection

YOLO-CIR: The network based on YOLO and ConvNeXt for infrared object detection

Research on Single Object Detection Technology Based on Infrared Multi-spectrum Fusion

YOLOSR-IST: A Deep Learning Method for Small Target Detection in Infrared Remote Sensing Images based on Super-Resolution and YOLO