Automatic excavator action recognition and localisation for untrimmed video using hybrid LSTM-Transformer networks

Abbey Martin,Andrew J. Hill,Konstantin M. Seiler,Mehala Balamurali,Abbey MartinAndrew J. HillKonstantin M. SeilerMehala BalamuraliAustralian Centre for Field Robotics,Faculty of Engineering,University of Sydney,Sydney,Australia
DOI: https://doi.org/10.1080/17480930.2023.2290364
IF: 3.022
2023-12-14
International Journal of Mining Reclamation and Environment
Abstract:In mining and construction, excavators are integral to earth-moving operations. Accurate knowledge of excavator activities may be used in productivity analysis to streamline delivery. This paper presents a computer vision-based method for excavator action detection which can automatically inference the occurrence and time duration of excavator actions from untrimmed video captured from the excavator cab. The model uses a three-stage architecture consisting of a VGG16 feature extractor, a four-stage Transformer Encoder-Long Short-Term Memory (LSTM) module, and a post-processing component. The model's predictive performance has been validated on the largest dataset among similar studies, comprising 567,000 frames filmed on-site at day and night. When tested on night and daytime videos, the model achieves accuracies of 90% and 70%, respectively, highlighting strong potential for practical implementation of the Transformer-LSTM network in excavator action detection. This study presents the first application of the combined Transformer-LSTM network for action detection in computer vision.
environmental sciences,mining & mineral processing
What problem does this paper attempt to address?