Abstract:Semi-supervised learning for video action recognition is a very challenging research area. Existing state-of-the-art methods perform data augmentation on the temporality of actions, which are combined with the mainstream consistency-based semi-supervised learning framework FixMatch for action recognition. However, these approaches have the following limitations: (1) data augmentation based on video clips lacks coarse-grained and fine-grained representations of actions in temporal sequences, and the models have difficulty understanding synonymous representations of actions in different motion phases. (2) Pseudo labeling selection based on the constant thresholds lacks a "make-up curriculum" for difficult actions, that results in the low utilization of unlabeled data corresponding to difficult actions. To address the above shortcomings, we propose a semi-supervised action recognition via the temporal augmentation using curriculum learning (TACL) algorithm. Compared to previous works, TACL explores different representations of the same semantics of actions in temporal sequences for video and uses the idea of curriculum learning (CL) to reduce the difficulty of the model training process. First, for different action expressions with the same semantics, we designed the temporal action augmentation (TAA) for videos to obtain coarse-grained and fine-grained action expressions based on constant-velocity and hetero-velocity methods, respectively. Second, we construct a temporal signal to constrain the model such that fine-grained action expressions containing different movement phases have the same prediction results, and achieve action consistency learning (ACL) by combining the label and pseudo-label signals. Finally, we propose action curriculum pseudo labeling (ACPL), a loosely and strictly parallel dynamic threshold evaluation algorithm for selecting and labeling unlabeled data. We evaluate TACL on three standard public datasets: U- F101, HMDB51, and Kinetics. The combined experiments show that TACL significantly improves the accuracy of models trained on a small amount of labeled data and better evaluates the learning effects for different actions.

Semi-Supervised Action Quality Assessment with Self-Supervised Segment Feature Recovery

Self-supervised Subaction Parsing Network for Semi-supervised Action Quality Assessment

Self-Supervised Sub-Action Parsing Network for Semi-Supervised Action Quality Assessment

Assessing action quality with semantic-sequence performance regression and densely distributed sample weighting

Semi-Supervised Action Recognition From Temporal Augmentation Using Curriculum Learning

Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment

Semi-supervised Learning for Multi-label Video Action Detection

Label-reconstruction-based pseudo-subscore learning for action quality assessment in sporting events

2M-AF: A Strong Multi-Modality Framework for Human Action Quality Assessment with Self-supervised Representation Learning

Procedure-Aware Action Quality Assessment: Datasets and Performance Evaluation

Semi-Supervised Multiple Feature Analysis for Action Recognition

Visual-semantic Alignment Temporal Parsing for Action Quality Assessment

Self-Supervised Learning for Semi-Supervised Temporal Action Proposal

Multilevel Semantic and Adaptive Actionness Learning for Weakly Supervised Temporal Action Localization

Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition

Fine-Grained Spatio-Temporal Parsing Network for Action Quality Assessment

Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization

Assessing Action Quality via Attentive Spatio-Temporal Convolutional Networks.

Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences

Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos

SMC-NCA: Semantic-guided Multi-level Contrast for Semi-supervised Temporal Action Segmentation