Abstract:Currently, human action recognition has witnessed remarkable progress, and its achievements have been applied to daily life. However, most methods extract features from only a single view within each modality, which may not comprehensively capture the diversity and complexity of actions. Moreover, the ineffective removal of redundant information can result in an inconspicuous description of key information. These issues cloud affect the final action recognition accuracy. To address these issues, this paper proposes a novel method for single-subject routine action recognition, which combines multi-view key information representation and multi-modal fusion. Firstly, the energy of non-primary motion areas is reduced by motion mean normalization in the depth video sequence, thereby enhancing key information of action. Then, depth motion history map (DMHM) and depth spatio-temporal energy map (DSTEM) are extracted from planes and axes, respectively. The proposed DMHM effectively preserves the spatio-temporal information of actions, DSTEM preserves the motion contour and energy information. In terms of skeleton sequences, statistical features and motion contribution degree of each joint are extracted from the view of motion distribution and weights, respectively. Finally, depth and skeleton features are fused to achieve multi-modal fusion-based action recognition. The proposed method highlights the information of the main motion areas, and achieves recognition accuracies of 96.70 on MSR-Action3D, 93.26 on UTD-MHAD, and above 97.73 on all tests of CZU-MHAD. The experimental results demonstrate that the proposed method effectively preserves action information and has better recognition accuracy than most existing methods.

Multi-layer Representation for Cross-view Action Recognition

Cross-modality Online Distillation for Multi-View Action Recognition

Multi-view daily action recognition based on Hooke balanced matrix and broad learning system

View-invariant Human Action Recognition Via Robust Locally Adaptive Multi-View Learning

Collaborative Attention Mechanism for Multi-View Action Recognition

Multi-view representation learning for multi-view action recognition.

Cross-view Action Recognition via Contrastive View-invariant Representation

Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition

Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition

Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton Based Action Recognition

Multi-view key information representation and multi-modal fusion for single-subject routine action recognition

Cross-view Action Modeling, Learning and Recognition

CAMVR: Context-Adaptive Multi-View Representation Learning for Dense Retrieval

Hypergraph-Based Multi-View Action Recognition Using Event Cameras

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

Continuous Multi-View Human Action Recognition

Multi-Domain and Multi-Task Learning for Human Action Recognition

Multi-View Time-Series Hypergraph Neural Network for Action Recognition

Annealing Temporal-Spatial Contrastive Learning for Multi-View Online Action Detection

DVANet: Disentangling View and Action Features for Multi-View Action Recognition