SpotFormer: A Transformer-based Framework for Precise Soccer Action Spotting

Mengqi Cao,Min Yang,Guozhen Zhang,Xiaotian Li,Yilu Wu,Gangshan Wu,Limin Wang
DOI: https://doi.org/10.1109/mmsp55362.2022.9948888
2022-01-01
Abstract:Action spotting and classification consist in detecting the exact moments at which events occur in long videos. The current mainstream spotting practices generally use a two-stage pipeline that performs feature collection and integration, then salient action detection and postprocessing. Following that, we present SpotFormer, a simple yet effective framework, capable of precise action spotting. Specifically, we employ several most advanced backbone networks as auxiliary feature extractors, and reduce feature dimensionality in a straightforward and efficient way. The frame-wise features are fed into a transformer-based spotting network devised to leverage spatiotemporal information. We obtain 0.609 tight mAP score via model ensemble and achieve the state-of-the-art performance on the SoccerNet-v2 dataset.
What problem does this paper attempt to address?