Video surveillance-based multi-task learning with swin transformer for earthwork activity classification

Yanan Lu,Ke You,Cheng Zhou,Jiaxi Chen,Zhangang Wu,Yutian Jiang,Chao Huang
DOI: https://doi.org/10.1016/j.engappai.2023.107814
IF: 8
2024-01-02
Engineering Applications of Artificial Intelligence
Abstract:Bulldozers, pivotal in earthworks, traditionally undergo supervision through labor-intensive and potentially unreliable manual methods. This research proposes a vision-based method for automating the monitoring of bulldozer operations. First, this research develops a specialized dataset for deep learning, the bulldozer earthmoving activity dataset. Following this, a novel multi-task video classification network (MTVTNet), the multi-task video transformer network, utilizing a video swin transformer architecture, is proposed. This network is adept at concurrently detecting the shoveling action, state, and soil classification of a bulldozer. The effectiveness of this model is demonstrated through its application in a real-world construction setting, achieving a remarkable 99.68% mean average precision. This method not only facilitates comprehensive automated supervision of bulldozer earthmoving activities but also serves as a valuable data source for assessing the operational efficiency of these machines.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary
What problem does this paper attempt to address?