End-to-end Action Quality Assessment with Action Parsing Transformer

Hang Fang,Wengang Zhou,Houqiang Li
DOI: https://doi.org/10.1109/vcip59821.2023.10402700
2023-01-01
Abstract:Action Quality Assessment (AQA) plays a crucial role in action understanding, and addressing this task poses unique challenges due to the presence of subtle visual differences among actions. Existing action assessment works typically make an overall quality prediction on an entire video. However, the internal structural parsing of actions are important in action quality assessment, which enhances the interpretability of the scoring process. To explore this underlying structural relationship, we propose an action parsing transformer to disintegrate the holistic feature into more fine-grained step-wise representations. Specifically, we utilize a set of learnable queries to represent the step-wise patterns for a specific action and our decoding process converts the video representation to a fixed number of step representations. Moreover, to obtain quality scores, we further devise a score generation module encompassing multiple action scorers, each of which is uniquely associated with specific steps to predict the corresponding step score. Extensive experiments on two public AQA benchmarks suggest that our method well assesses the action quality and achieves outstanding performance.
What problem does this paper attempt to address?