Abstract:In computer vision, human action recognition is a hot topic, popularized by the development of deep learning. Deep learning models typically accept video input without prior processing and train them to achieve recognition. However, conducting preliminary motion analysis can be beneficial in directing the model training to prioritize the motion of individuals with less priority for the environment in which the action occurs. This paper puts forth a novel methodology for human action recognition based on motion information that employs transfer-learning techniques. The proposed method comprises four stages: (1) human detection and tracking, (2) motion estimation, (3) feature extraction, and (4) action recognition using a two-stream model. In order to develop this work, a customized dataset was utilized, comprising videos of diverse actions (e.g., walking, running, cycling, drinking, and falling) extracted from multiple public sources and websites, including Pexels and MixKit. This realistic and diverse dataset allowed for a comprehensive evaluation of the proposed method, demonstrating its effectiveness in different scenarios and conditions. Furthermore, the performance of seven pre-trained models for feature extraction was evaluated. The models analyzed were Inception-v3, MobileNet-v2, MobileNet-v3-L, VGG-16, VGG-19, Xception, and ConvNeXt-L. The results demonstrated that the ConvNeXt-L model yielded the most optimal outcomes. Furthermore, using pre-trained models for feature extraction facilitated the training process on a personal computer with a single graphics processing unit, achieving an accuracy of 94.9%. The experimental findings and outcomes suggest that integrating motion information enhances action recognition performance.

Boosted Exemplar Learning for human action recognition

Boosted Exemplar Learning for Action Recognition and Annotation

Boosted Multi-Class Semi-Supervised Learning for Human Action Recognition

Knowledge-guided Pre-Training and Fine-Tuning: Video Representation Learning for Action Recognition

Semi-Supervised Multiple Feature Analysis for Action Recognition

Human Action Recognition Using Deep Learning Methods.

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

Online Robust Action Recognition Based on a Hierarchical Model

Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition

Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition

Human action recognition via multi-view learning.

Using a Selective Ensemble Support Vector Machine to Fuse Multimodal Features for Human Action Recognition

Human Action Recognition Based on DMMs, HOGs and Contourlet Transform

Action recognition using attention-based spatio-temporal VLAD networks and adaptive video sequences optimization

Collaboratively Self-supervised Video Representation Learning for Action Recognition

Representing Videos As Discriminative Sub-graphs for Action Recognition*

Action Recognition and Benchmark Using Event Cameras.

Action Recognition By Learning Deep Multi-Granular Spatio-Temporal Video Representation

Action Recognition by Exploring Data Distribution and Feature Correlation

Action Recognition in Videos through a Transfer-Learning-Based Technique

Learning a Similarity Metric Discriminatively for Pose Exemplar Based Action Recognition