Abstract:Tennis has becoming an increasingly popular sport throughout the world. Tennis motion recognition based on 3D video has attracted more and more attention in recent years. The algorithm based on dynamic time warping takes into account the timing sequence information of movements and can solve the uncertainty of human movement at temporal level. By increasing the training samples, the efficiency will decrease accordingly. This work presents a tennis action recognition framework based on action standard sequence. The 3D action video samples are incorporated into action sequences by feature extraction, wherein the action standard sequences are encoded as a sequence averaging optimization problem under the dynamic time normalization metric. The dynamic time normalization barycenter averaging algorithm (DBA) is leveraged to solve this problem. For the tennis scenery with significant differences in the action categories, we study the standard sequence learning of multiple actions, and accordingly propose a DBA-K-means clustering algorithm for unsupervised learning. Herein, a human tennis action recognition by integrating feature optimization and image similarity is proposed. The three dimensional reduction methods, including principal component analysis (PCA),PCA + Pearson, and PCA+ Spearman, were compared to prove that PCA+ Pearson correlation coefficient had the best dimensional reduction effect. Meanwhile, the global feature eight-star model is combined with the local feature HOG feature after dimensionally reduced to fully represent human movements. The similarity between pairwise adjacent frames of images was calculated. The statistical weight of single frame SVM classification results within a discriminant period is adaptively allocated, and finally the body pose recognition results are classified twice. Experiments on standard data set KTH show that the recognition accuracy of this algorithm is 94.5%, which is better than other methods. It has a good application value in the field of video human motion recognition. Also, we have demonstrated that this method can further improve the efficiency and accuracy of action recognition. Effective feature extraction is beneficial to improve the accuracy of subsequent human action recognition.

Spatio-temporal Laplacian Pyramid Coding for Action Recognition.

High-Order PCA of Video Volume Tensors for Human Action Representation and Recognition Shu Kong And

Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition

A Channel-Wise Spatial-Temporal Aggregation Network for Action Recognition

Beyond Spatial Pyramid Matching: Space-time Extended Descriptor for Action Recognition

Modeling Geometric-Temporal Context with Directional Pyramid Co-Occurrence for Action Recognition

Deep Spatial/temporal-level feature engineering for Tennis-based action recognition

An Attentional Spatial Temporal Graph Convolutional Network with Co-Occurrence Feature Learning for Action Recognition

P-Laplacian Regularized Sparse Coding for Human Activity Recognition

Spatiotemporal Pyramid Network for Video Action Recognition

Action-Stage Emphasized Spatiotemporal VLAD for Video Action Recognition

Action Recognition By Learning Deep Multi-Granular Spatio-Temporal Video Representation

Combining Sparse And Dense Descriptors With Temporal Semantic Structures For Robust Human Action Recognition

Embedding Motion and Structure Features for Action Recognition

Deep spatiotemporal LSTM network with temporal pattern feature for 3D human action recognition

Spatio-Temporal Collaborative Module for Efficient Action Recognition

Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition

Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network

A Compact Representation of Human Actions by Sliding Coordinate Coding

Interaction-Aware Spatio-Temporal Pyramid Attention Networks for Action Classification

Empowering Efficient Spatio-Temporal Learning with a 3D CNN for Pose-Based Action Recognition