Abstract:Construction workplace hazard detection requires engineers to analyze scenes manually against many safety rules, which is time-consuming, labor-intensive, and error-prone. Computer vision algorithms are yet to achieve reliable discrimination of anomalous and benign object relations underpinning safety violation detections. Recently developed deep learning-based computer vision algorithms need tens of thousands of images, including labels of the safety rules violated, in order to train deep-learning networks for acquiring spatiotemporal reasoning capacity in complex workplaces. Such training processes need human experts to label images and indicate whether the relationship between the worker, resource, and equipment in the scenes violate spatiotemporal arrangement rules for safe and productive operations. False alarms in those manual labels (labeling no-violation images as having violations) can significantly mislead the machine learning process and result in computer vision models that produce inaccurate hazard detections. Compared with false alarms, another type of mislabels, false negatives (labeling images having violations as "no violations"), seem to have fewer impacts on the reliability of the trained computer vision models. This paper examines a new crowdsourcing approach that achieves above 95% accuracy in labeling images of complex construction scenes having safety-rule violations, with a focus on minimizing false alarms while keeping acceptable rates of false negatives. The development and testing of this new crowdsourcing approach examine two fundamental questions: (1) How to characterize the impacts of a short safety-rule training process on the labeling accuracy of non-professional image annotators? And (2) How to properly aggregate the image labels contributed by ordinary people to filter out false alarms while keeping an acceptable false negative rate? In designing short training sessions for online image annotators, the research team split a large number of safety rules into smaller sets of six. An online image annotator learns six safety rules randomly assigned to him or her, and then labels workplace images as "no violation" or 'violation" of certain rules among the six learned by him or her. About one hundred and twenty anonymous image annotators participated in the data collection. Finally, a Bayesian-network-based crowd consensus model aggregated these labels from annotators to obtain safety-rule violation labeling results. Experiment results show that the proposed model can achieve close to 0% false alarm rates while keeping the false negative rate below 10%. Such image labeling performance outdoes existing crowdsourcing approaches that use majority votes for aggregating crowdsourced labels. Given these findings, the presented crowdsourcing approach sheds lights on effective construction safety surveillance by integrating human risk recognition capabilities into advanced computer vision.

Crowdsourcing System for Multi-object Annotation in Surveillance Videos

Crowd Sensing Based Semantic Annotation of Surveillance Videos.

Crowdsourcing Detection of Sampling Biases in Image Datasets

Crowdsourcing in Computer Vision

Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

SAGTA: Semi-automatic Ground Truth Annotation in crowd scenes

Crowdsourcing Upon Learning: Energy-Aware Dispatch With Guarantee for Video Analytics

A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions

Crowdsourced Reliable Labeling of Safety-Rule Violations on Images of Complex Construction Scenes for Advanced Vision-Based Workplace Safety

CDAS: A Crowdsourcing Data Analytics System

COCA: Cost-Effective Collaborative Annotation System by Combining Experts and Amateurs

Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation

Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications

Learning from Crowds with Annotation Reliability

Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding.

Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions

Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges

A Deep Learning Based Platform for Remote Sensing Images Change Detection Integrating Crowdsourcing and Active Learning

No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data

Online Multi-Label Active Annotation

Optimizing Cloud-Based Video Crowdsensing.