Abstract:In this paper, we conduct an in-depth study and analysis of sports video recognition by improved hidden Markov model. The feature module is a complex gesture recognition module based on hidden Markov model gesture features, which applies the hidden Markov model features to gesture recognition and performs the recognition of complex gestures made by combining simple gestures based on simple gesture recognition. The combination of the two modules forms the overall technology of this paper, which can be applied to many scenarios, including some special scenarios with high-security levels that require real-time feedback and some public indoor scenarios, which can achieve different prevention and services for different age groups. With the increase of the depth of the feature extraction network, the experimental effect is enhanced; however, the two-dimensional convolutional neural network loses temporal information when extracting features, so the three-dimensional convolutional network is used in this paper to extract features from the video in time and space. Multiple binary classifications of the extracted features are performed to achieve the goal of multilabel classification. A multistream residual neural network is used to extract features from video data of three modalities, and the extracted feature vectors are fed into the attention mechanism network, then, the more critical information for video recognition is selected from a large amount of spatiotemporal information, further learning the temporal dependencies existing between consecutive video frames, and finally fusing the multistream network outputs to obtain the final prediction category. By training and optimizing the model in an end-to-end manner, recognition accuracies of 92.7% and 64.4% are achieved on the dataset, respectively.

Sports Video Captioning by Attentive Motion Representation based Hierarchical Recurrent Neural Networks.

Sports Video Captioning via Attentive Motion Representation and Group Relationship Modeling

Fine-Grained Video Captioning for Sports Narrative.

Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks

MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning.

Hidden Markov Model-Based Video Recognition for Sports

Sports Video Analysis on Large-Scale Data

Memory-Attended Recurrent Network For Video Captioning

CAM-RNN: Co-Attention Model Based RNN for Video Captioning.

Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning

Research on Volleyball Video Intelligent Description Technology Combining the Long-Term and Short-Term Memory Network and Attention Mechanism

Dual-Stream Recurrent Neural Network for Video Captioning

Sports Video Classification Method Based on Improved Deep Learning

Video Captioning with Transferred Semantic Attributes.

SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos

Rich Visual and Language Representation with Complementary Semantics for Video Captioning

Learning Group Interaction for Sports Video Understanding from a Perspective of Athlete

A simple yet effective knowledge guided method for entity-aware video captioning on a basketball benchmark

Video captioning with recurrent networks based on frame- and video-level features and visual content classification

Spatio-Temporal Ranked-Attention Networks for Video Captioning

Video Captioning Via Two-stage Attention Model and Generative Adversarial Network