Abstract:Spatial-temporal modeling is crucial for action recognition in videos within the field of artificial intelligence. However, robustly extracting motion information remains a primary challenge due to temporal deformations of appearances and variations in motion frequencies between different actions. In order to address these issues, we propose an innovative and effective method called the Motion Sensitive Network (MSN), incorporating the theories of artificial neural networks and key concepts of autonomous system control and decision-making. Specifically, we employ an approach known as Spatial-Temporal Pyramid Motion Extraction (STP-ME) module, adjusting convolution kernel sizes and time intervals synchronously to gather motion information at different temporal scales, aligning with the learning and prediction characteristics of artificial neural networks. Additionally, we introduce a new module called Variable Scale Motion Excitation (DS-ME), utilizing a differential model to capture motion information in resonance with the flexibility of autonomous system control. Particularly, we employ a multi-scale deformable convolutional network to alter the motion scale of the target object before computing temporal differences across consecutive frames, providing theoretical support for the flexibility of autonomous systems. Temporal modeling is a crucial step in understanding environmental changes and actions within autonomous systems, and MSN, by integrating the advantages of Artificial Neural Networks (ANN) in this task, provides an effective framework for the future utilization of artificial neural networks in autonomous systems. We evaluate our proposed method on three challenging action recognition datasets (Kinetics-400, Something-Something V1, and Something-Something V2). The results indicate an improvement in accuracy ranging from 1.1% to 2.2% on the test set. When compared with state-of-the-art (SOTA) methods, the proposed approach achieves a maximum performance of 89.90%. In ablation experiments, the performance gain of this module also shows an increase ranging from 2% to 5.3%. The introduced Motion Sensitive Network (MSN) demonstrates significant potential in various challenging scenarios, providing an initial exploration into integrating artificial neural networks into the domain of autonomous systems.

Scene adaptive mechanism for action recognition

ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition.

Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition

SAM: Modeling Scene, Object and Action with Semantics Attention Modules for Video Recognition

AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

Spatio-Temporal Adaptive Network with Bidirectional Temporal Difference for Action Recognition

ACTION-Net: Multipath Excitation for Action Recognition

Learning Comprehensive Motion Representation for Action Recognition

CANet: Comprehensive Attention Network for video-based action recognition

Action recognition method based on a novel keyframe extraction method and enhanced 3D convolutional neural network

Joint Network based Attention for Action Recognition

Action-Stage Emphasized Spatiotemporal VLAD for Video Action Recognition

Temporal Attentive Network for Action Recognition

Action recognition using attention-based spatio-temporal VLAD networks and adaptive video sequences optimization

EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition

MIE-Net: Motion Information Enhancement Network for Fine-Grained Action Recognition Using RGB Sensors

Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition

SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition

Motion sensitive network for action recognition in control and decision-making of autonomous systems