Assessing action quality with semantic-sequence performance regression and densely distributed sample weighting

Feng Huang,Jianjun Li
DOI: https://doi.org/10.1007/s10489-024-05349-6
IF: 5.3
2024-03-02
Applied Intelligence
Abstract:Action Quality Assessment (AQA) is a critical branch of video understanding, offering impartial evaluations for competitive sports. Existing paradigms tend to assess action quality using equal-length clips that lack sufficient semantics, leading to suboptimal predictions. To address this issue, we propose to conduct AQA with Semantic-Sequence Performance Regression (SSPR). SSPR first divides an action into a series of unequal-length segments according to the semantic continuity of the video, such as jumping, dropping, and entering the water in diving. Specifically, the latest Temporal Convolutional Network (TCN) is adopted for semantic-sequence segmentation. To better achieve SSPR, we design a feature fusion module that integrates the semantics of each segment using cascaded 1D convolutions. Furthermore, the imbalanced distribution phenomenon is usually ignored in AQA and we attempt to propose a new loss called positive-weighting MSE (PW-MSE) to deal with it. PW-MSE encourages the network to focus more on densely distributed samples during training, which further improves the network's ranking performance. Experimental results on the benchmark datasets (i.e., UNLV-Dive and AQA-7) demonstrate that our proposed method outperforms the current state-of-the-arts.
computer science, artificial intelligence
What problem does this paper attempt to address?