Adaptive temporal compression for reduction of computational complexity in human behavior recognition

Haixin Huang,Yuyao Wang,Mingqi Cai,Ruipeng Wang,Feng Wen,Xiaojie Hu

DOI: https://doi.org/10.1038/s41598-024-61286-x

IF: 4.6

2024-05-10

Scientific Reports

Abstract:The research on video analytics especially in the area of human behavior recognition has become increasingly popular recently. It is widely applied in virtual reality, video surveillance, and video retrieval. With the advancement of deep learning algorithms and computer hardware, the conventional two-dimensional convolution technique for training video models has been replaced by three-dimensional convolution, which enables the extraction of spatio-temporal features. Specifically, the use of 3D convolution in human behavior recognition has been the subject of growing interest. However, the increased dimensionality has led to challenges such as the dramatic increase in the number of parameters, increased time complexity, and a strong dependence on GPUs for effective spatio-temporal feature extraction. The training speed can be considerably slow without the support of powerful GPU hardware. To address these issues, this study proposes an Adaptive Time Compression (ATC) module. Functioning as an independent component, ATC can be seamlessly integrated into existing architectures and achieves data compression by eliminating redundant frames within video data. The ATC module effectively reduces GPU computing load and time complexity with negligible loss of accuracy, thereby facilitating real-time human behavior recognition.

multidisciplinary sciences

What problem does this paper attempt to address?

The paper primarily focuses on addressing the problem of human action recognition in video data analysis, particularly the issue of increased computational complexity and the number of parameters when using 3D convolution. Specifically, the paper proposes an Adaptive Time Compression (ATC) module, which can be seamlessly integrated into existing architectures to achieve data compression by removing redundant frames from video data. The ATC module can significantly reduce GPU computational load and time complexity with minimal impact on accuracy, thereby facilitating real-time human action recognition. The main contributions of the paper are as follows: 1. **Seamlessly Integrated ATC Module**: Capable of compressing datasets by removing redundant video frames with minimal information loss. 2. **Reduction in Sample Quantity**: Reduces the number of training and testing samples through data compression, thereby lowering the computational load and time complexity of the model. 3. **Experimental Validation**: Experimental results show that this method improves experimental efficiency and model performance with almost no loss in accuracy. The paper validates the effectiveness of the ATC module through comparative experiments, particularly its performance on the UCF101 and Kinetics datasets, demonstrating its significant advantages in improving model efficiency.

Adaptive temporal compression for reduction of computational complexity in human behavior recognition

Motion Guided Token Compression for Efficient Masked Video Modeling

AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition

TEINet: Towards an Efficient Architecture for Video Recognition.

Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition

TAM: Temporal Adaptive Module for Video Recognition

A Real-Time Action Representation With Temporal Encoding and Deep Compression

3D-TDC: A 3D temporal dilation convolution framework for video action recognition

Adaptive Focus for Efficient Video Recognition

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems

Human Behavior Recognition Based on Attention Mechanism and Bottleneck Residual Dual-Path Spatiotemporal Graph Convolutional Network

Grouped Spatial-Temporal Aggregation for Efficient Action Recognition

Spatio-Temporal Collaborative Module for Efficient Action Recognition

2D or Not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

High Performance Gesture Recognition Via Effective and Efficient Temporal Modeling.

Edge-Assisted Real-Time Video Analytics with Spatial–Temporal Redundancy Suppression

Abnormal behavior capture of video dynamic target based on 3D convolutional neural network

Temporal-attentive Covariance Pooling Networks for Video Recognition

3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks