Abstract:In computer vision, human action recognition is a hot topic, popularized by the development of deep learning. Deep learning models typically accept video input without prior processing and train them to achieve recognition. However, conducting preliminary motion analysis can be beneficial in directing the model training to prioritize the motion of individuals with less priority for the environment in which the action occurs. This paper puts forth a novel methodology for human action recognition based on motion information that employs transfer-learning techniques. The proposed method comprises four stages: (1) human detection and tracking, (2) motion estimation, (3) feature extraction, and (4) action recognition using a two-stream model. In order to develop this work, a customized dataset was utilized, comprising videos of diverse actions (e.g., walking, running, cycling, drinking, and falling) extracted from multiple public sources and websites, including Pexels and MixKit. This realistic and diverse dataset allowed for a comprehensive evaluation of the proposed method, demonstrating its effectiveness in different scenarios and conditions. Furthermore, the performance of seven pre-trained models for feature extraction was evaluated. The models analyzed were Inception-v3, MobileNet-v2, MobileNet-v3-L, VGG-16, VGG-19, Xception, and ConvNeXt-L. The results demonstrated that the ConvNeXt-L model yielded the most optimal outcomes. Furthermore, using pre-trained models for feature extraction facilitated the training process on a personal computer with a single graphics processing unit, achieving an accuracy of 94.9%. The experimental findings and outcomes suggest that integrating motion information enhances action recognition performance.

A data augmentation method for human action recognition using dense joint motion images

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition

Human Action Recognition From Digital Videos Based on Deep Learning.

Joint Dynamic Pose Image and Space Time Reversal for Human Action Recognition from Videos

Human Action Recognition Using Deep Learning Methods.

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

Motion Recognition Based on Deep Learning and Human Joint Points

Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks

Recognizing actions using depth motion maps-based histograms of oriented gradients

MIE-Net: Motion Information Enhancement Network for Fine-Grained Action Recognition Using RGB Sensors

Deep Learning-Based Human Action Recognition in Videos

Dynamic Action Recognition: A convolutional neural network model for temporally organized joint location data

Action Recognition in Videos through a Transfer-Learning-Based Technique

Human Action Recognition Based on DMMs, HOGs and Contourlet Transform

Human motion recognition information processing system based on LSTM Recurrent Neural Network Algorithm

DB-LSTM: Densely-connected Bi-directional LSTM for Human Action Recognition

I Know How You Move: Explicit Motion Estimation for Human Action Recognition

Improving Small-Scale Human Action Recognition Performance Using a 3D Heatmap Volume

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

Human action recognition in videos using motion impression image

An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video