A Feature-Level Fusion-Based Multimodal Analysis of Recognition and Classification of Awkward Working Postures in Construction
Xiaer Xiahou,Zirui Li,Jikang Xia,Zhipeng Zhou,Qiming Li
DOI: https://doi.org/10.1061/jcemd4.coeng-13795
2023-01-01
Journal of Construction Engineering and Management
Abstract:Developing approaches for recognition and classification of awkward working postures is of great significance for proactive management of safety risks and work-related musculoskeletal disorders (WMSDs) in construction. Previous efforts have concentrated on wearable sensors or computer vision-based monitoring. However, certain limitations need to be further investigated. First, wearable sensor-based studies lack reliability due to vulnerability to environmental interferences. Second, conventional computer vision-based recognition demonstrates classification inaccuracy under adverse environmental conditions, such as insufficient illumination and occlusion. To address the above limitations, this study presents an innovative and automated approach for recognizing and classifying awkward working postures. This approach leverages multimodal data collected from various sensors and apparatuses, allowing for a comprehensive analysis of different modalities. A feature-level fusion strategy is employed to train deep learning-based networks, including a multilayer perceptron (MLP), recurrent neural network (RNN), and long short-term memory (LSTM). Among these networks, the LSTM model achieves optimal performance, with an impressive accuracy of 99.6% and an F1-score of 99.7%. A comparison of metrics between single-modality and multimodal-fused training methods demonstrates that the incorporation of multimodal fusion significantly enhances the classification performance. Furthermore, the study examines the performance of the LSTM network under adverse environmental conditions. The accuracy of the model remains consistently above 90% in such conditions, indicating that the model's generalizability is enhanced through the multimodal fusion strategy. In conclusion, this study mainly contributes to the body of knowledge on proactive prevention for safety and health risks in the construction industry by offering an automated approach with excellent adaptability in adverse conditions. Moreover, this innovative attempt integrating diverse data through multimodal fusion may provide inspiration for future studies to achieve advancements. Mitigating potential risk factors for work-related musculoskeletal disorders (WMSD) in construction and improving safety and health performance are crucial in construction projects. Construction workers are frequently exposed to prolonged periods of awkward working postures. In pursuit of a more comprehensive solution, a pioneering and automated approach for the recognition and classification of such postures is developed. Specifically, this approach is rooted in the use of joint point data extracted from RGB images, synergistically fused with motion data representing the activity state and electroencephalogram (EEG) data representing the cognitive state. Rigorously tested, this approach demonstrates remarkable classification performance in deep-learning networks, boasting a maximum accuracy of 99.6%. Such high accuracy substantiates its potential for implementation in real construction management. Considering the inherent complexities of dynamic construction sites, compounded by challenging environmental conditions such as insufficient illumination and occlusion, automated identification methods commonly confront limitations in utility. In response, the integrated approach in this study amalgamates the rich information derived from diverse modalities, ensuring a sustained high accuracy rate of 94.9%. This not only demonstrates the exceptional performance of the new approach, but also its generalizability, thereby enabling proactive management of ergonomic and safety risks in construction sites.