STAN: Spatial-Temporal Awareness Network for Temporal Action Detection

Minghao Liu,Haiyi Liu,Sirui Zhao,Fei Ma,Minglei Li,Zonghong Dai,Hao Wang,Tong Xu,Enhong Chen
DOI: https://doi.org/10.1145/3606038.3616169
2023-01-01
Abstract:In recent years, there have been significant advancements in the field of temporal action detection. However, few studies have focused on detecting actions in sporting events. In this context, the MMSports 2023 cricket bowl release challenge aims to identify the bowl release action by segmenting untrimmed videos. To achieve this, we propose a novel cricket bowl release detection framework based on Spatial-Temporal Awareness Network (STAN) which mainly consists of three modules: the spatial feature extraction module (SFEM), the temporal feature extraction module (TFEM), and the classification module (CM). Specifically, we first adopt ResNet to extract the spatial features from videos in SFEM. Then, the TFEM is designed to aggregate temporal features using Bi-LSTM to obtain spatial-temporal features. Afterward, the CM converts the spatial-temporal features into action category probabilities to localize the action segments. Besides, we introduce the weighted binary cross entropy loss to solve the data imbalance problem in cricket bowl release detection. Finally, the experiments show that our proposed STAN achieves competitive performance in 1st place with a PQ score of 0.643 on the cricket bowl release challenge. The code is also publicly available at https://github.com/lmhr/STAN.
What problem does this paper attempt to address?