Abstract:Currently, human action recognition has witnessed remarkable progress, and its achievements have been applied to daily life. However, most methods extract features from only a single view within each modality, which may not comprehensively capture the diversity and complexity of actions. Moreover, the ineffective removal of redundant information can result in an inconspicuous description of key information. These issues cloud affect the final action recognition accuracy. To address these issues, this paper proposes a novel method for single-subject routine action recognition, which combines multi-view key information representation and multi-modal fusion. Firstly, the energy of non-primary motion areas is reduced by motion mean normalization in the depth video sequence, thereby enhancing key information of action. Then, depth motion history map (DMHM) and depth spatio-temporal energy map (DSTEM) are extracted from planes and axes, respectively. The proposed DMHM effectively preserves the spatio-temporal information of actions, DSTEM preserves the motion contour and energy information. In terms of skeleton sequences, statistical features and motion contribution degree of each joint are extracted from the view of motion distribution and weights, respectively. Finally, depth and skeleton features are fused to achieve multi-modal fusion-based action recognition. The proposed method highlights the information of the main motion areas, and achieves recognition accuracies of 96.70 on MSR-Action3D, 93.26 on UTD-MHAD, and above 97.73 on all tests of CZU-MHAD. The experimental results demonstrate that the proposed method effectively preserves action information and has better recognition accuracy than most existing methods.

Action Recognition Using Form and Motion Modalities

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition

A Method of Simultaneously Action Recognition and Video Segmentation of Video Streams.

Embedding Motion and Structure Features for Action Recognition

Online Robust Action Recognition Based on a Hierarchical Model

Learning Comprehensive Motion Representation for Action Recognition

Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling

Multimodal human action recognition based on spatio-temporal action representation recognition model

Action recognition and detection by combining motion and appearance features

Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

Action Recognition By Learning Deep Multi-Granular Spatio-Temporal Video Representation

Learning and Distillating the Internal Relationship of Motion Features in Action Recognition.

Multi-view key information representation and multi-modal fusion for single-subject routine action recognition

Human Action Recognition Using Deep Learning Methods.

3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Human action recognition in videos using hybrid motion features

Video sketch: A middle-level representation for action recognition

Human Action Recognition Based on Motion Feature and Manifold Learning

Structural feature representation and fusion of human spatial cooperative motion for action recognition

Using a Selective Ensemble Support Vector Machine to Fuse Multimodal Features for Human Action Recognition