Abstract:In computer vision, human action recognition is a hot topic, popularized by the development of deep learning. Deep learning models typically accept video input without prior processing and train them to achieve recognition. However, conducting preliminary motion analysis can be beneficial in directing the model training to prioritize the motion of individuals with less priority for the environment in which the action occurs. This paper puts forth a novel methodology for human action recognition based on motion information that employs transfer-learning techniques. The proposed method comprises four stages: (1) human detection and tracking, (2) motion estimation, (3) feature extraction, and (4) action recognition using a two-stream model. In order to develop this work, a customized dataset was utilized, comprising videos of diverse actions (e.g., walking, running, cycling, drinking, and falling) extracted from multiple public sources and websites, including Pexels and MixKit. This realistic and diverse dataset allowed for a comprehensive evaluation of the proposed method, demonstrating its effectiveness in different scenarios and conditions. Furthermore, the performance of seven pre-trained models for feature extraction was evaluated. The models analyzed were Inception-v3, MobileNet-v2, MobileNet-v3-L, VGG-16, VGG-19, Xception, and ConvNeXt-L. The results demonstrated that the ConvNeXt-L model yielded the most optimal outcomes. Furthermore, using pre-trained models for feature extraction facilitated the training process on a personal computer with a single graphics processing unit, achieving an accuracy of 94.9%. The experimental findings and outcomes suggest that integrating motion information enhances action recognition performance.

Action Recognition in Videos through a Transfer-Learning-Based Technique

ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition.

Human Action Recognition From Digital Videos Based on Deep Learning.

Human Action Recognition Using Deep Learning Methods.

Deep Learning-Based Human Action Recognition in Videos

Action Recognition By Learning Deep Multi-Granular Spatio-Temporal Video Representation

Human action recognition using attention based LSTM network with dilated CNN features

Channel Attention-Based Approach with Autoencoder Network for Human Action Recognition in Low-Resolution Frames

A very deep sequences learning approach for human action recognition

Video-Based Human Activity Recognition Using Deep Learning Approaches

Fast and Reliable Human Action Recognition in Video Sequences by Sequential Analysis

Video-based Human Action Recognition using Deep Learning: A Review

Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks

Action Recognition Based on Object Tracking and Dense Trajectories

Video Action Recognition Using spatio-temporal optical flow video frames

A data augmentation method for human action recognition using dense joint motion images

Human Action Recognition in Videos using Convolution Long Short-Term Memory Network with Spatio-Temporal Networks

Improved Convolutional 3D Networks for Micro-Movements Recognition

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

Evaluating the Performance of Mobile-Convolutional Neural Networks for Spatial and Temporal Human Action Recognition Analysis

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks