Abstract:Sensing and reality capture devices are widely used in construction sites. Among different technologies, vision-based sensors are by far the most common and ubiquitous. A large volume of images and videos is collected from construction projects every day to track work progress, measure productivity, litigate claims, and monitor safety compliance. Manual interpretation of such colossal amounts of data, however, is non-trivial, error-prone, and resource-intensive. This has motivated new research on soft computing methods that utilize high-power data processing, computer vision, and deep learning (DL) in the form of convolutional neural networks (CNNs). A fundamental step toward machine-driven interpretation of construction site scenery is to accurately identify objects of interest for a particular problem. The accuracy requirement, however, may offset the computational speed of the candidate method. While lightweight DL algorithms (e.g., Mask R-CNN) can perform visual recognition with relatively high accuracy, they suffer from low processing efficacy, which hinders their use in real-time decision-making. One of the most promising DL algorithms that balance speed and accuracy is YOLO (you-only-look-once). This paper investigates YOLO-based CNN models in fast detection of construction objects. First, a large-scale image dataset, named Pictor-v2, is created, which contains about 3,500 images and approximately 11,500 instances of common construction site objects (e.g., building, equipment, worker). To assess the agility of object detection, transfer learning is used to train two variations of this model, namely, YOLO-v2 and YOLO-v3, and test them on different data combinations (crowdsourced, web-mined, or both). Results indicate that performance is higher if the model is trained on both crowdsourced and web-mined images. Additionally, YOLO-v3 outperforms YOLO-v2 by focusing on smaller, harder-to-detect objects. The best-performing YOLO-v3 model has a 78.2% mAP when tested on crowdsourced data. Sensitivity analysis of the output shows that the model's strong suit is in detecting larger objects in less crowded and well-lit spaces. The proposed methodology can also be extended to predict the relative distance of the detected objects with reliable accuracy. Findings of this work lay the foundation for further research on technology-assistive systems to augment human capacities in quickly and reliably interpreting visual data in complex environments.

Rubber tapping line detection in near-range images via customized YOLO and U-Net branches with parallel aggregation heads convolutional neural network

Tapped area detection and new tapping line location for natural rubber trees based on improved mask region convolutional neural network

LWRN: Light-Weight Residual Network for Edge Detection

Batched-image detection model and deployment method for tunnel lining defects using line-scan cameras based on experiments study

An Online Rail Track Fastener Classification System Based on YOLO Models

Overlapping Shoeprint Detection by Edge Detection and Deep Learning

LEOD-Net: Learning Line-Encoded Bounding Boxes for Real-Time Object Detection

A Multiscale Instance Segmentation Method Based on Cleaning Rubber Ball Images

Multi-class Object Detection Algorithm Based on Convolutional Neural Network

Multiscale apple recognition method based on improved CenterNet

Defect Detection of Subway Tunnels Using Advanced U-Net Network

YOLOv8n-RSDD: A High-Performance Low-Complexity Rail Surface Defect Detection Network

CosineTR: A Dual-Branch Transformer-Based Network for Semantic Line Detection

R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection

Deep Convolutional Networks for Construction Object Detection Under Different Visual Conditions

Application of YOLOv5 Based on Attention Mechanism and Receptive Field in Identifying Defects of Thangka Images

A Lightweight YOLOv5-Based Model with Feature Fusion and Dilation Convolution for Image Segmentation

Cucumber Picking Recognition in Near-Color Background Based on Improved YOLOv5

Octave-YOLO: Cross frequency detection network with octave convolution

Differential Image-Based Scalable YOLOv7-Tiny Implementation for Clustered Embedded Systems

YOLOv8-CGRNet: A Lightweight Object Detection Network Leveraging Context Guidance and Deep Residual Learning