Residual deep gated recurrent unit-based attention framework for human activity recognition by exploiting dilated features

Ajeet Pandey,Piyush Kumar
DOI: https://doi.org/10.1007/s00371-024-03266-w
IF: 2.835
2024-02-08
The Visual Computer
Abstract:Human activity recognition (HAR) in video streams becomes a thriving research area in computer vision and pattern recognition. Activity recognition in actual video is quite demanding due to a lack of data with respect to motion, way or style, and cluttered background. The current HAR approaches primarily apply pre-trained weights of various deep learning (DL) models for the apparent description of frames during the learning phase. It impacts the assessment of feature discrepancies, like the separation between both the temporal and visual cues. To address this issue, a residual deep gated recurrent unit (RD-GRU)-enabled attention framework with a dilated convolutional neural network (DiCNN) is introduced in this article. This approach particularly targets potential information in the input video frame to recognize the distinct activities in the videos. The DiCNN network is used to capture the crucial, unique features. In this network, the skip connection segment is employed with DiCNN to update the information that retains more knowledge than a shallow layer. Moreover, these features are fed into an attention module to capture the added high-level discriminative action associated with patterns and signs. The attention mechanism is followed by an RD-GRU to learn the long video sequences in order to enhance the performance. The performance metrics, namely accuracy, precision, recall, and f1-score, are used to evaluate the performance of the introduced model on four diverse benchmark datasets: UCF11, UCF Sports, JHMDB, and THUMOS. On these datasets it achieves an accuracy of 98.54%, 99.31%, 82.47%, and 95.23%, respectively. This illustrates the validity of the proposed work compared with state-of-the-art (SOTA) methods.
computer science, software engineering
What problem does this paper attempt to address?