Deep learning for action spotting in association football videos

Silvio Giancola,Anthony Cioppa,Bernard Ghanem,Marc Van Droogenbroeck
2024-10-02
Abstract:The task of action spotting consists in both identifying actions and precisely localizing them in time with a single timestamp in long, untrimmed video streams. Automatically extracting those actions is crucial for many sports applications, including sports analytics to produce extended statistics on game actions, coaching to provide support to video analysts, or fan engagement to automatically overlay content in the broadcast when specific actions occur. However, before 2018, no large-scale datasets for action spotting in sports were publicly available, which impeded benchmarking action spotting methods. In response, our team built the largest dataset and the most comprehensive benchmarks for sports video understanding, under the umbrella of SoccerNet. Particularly, our dataset contains a subset specifically dedicated to action spotting, called SoccerNet Action Spotting, containing more than 550 complete broadcast games annotated with almost all types of actions that can occur in a football game. This dataset is tailored to develop methods for automatic spotting of actions of interest, including deep learning approaches, by providing a large amount of manually annotated actions. To engage with the scientific community, the SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances. Thanks to our dataset and challenges, more than 60 methods were developed or published over the past five years, improving on the first baselines and making action spotting a viable option for the sports industry. This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to automatically identify and precisely locate specific actions in football videos (i.e., "action localization" or "action spotting"). Specifically, the action localization task not only needs to identify specific actions in the video, but also needs to accurately label the time points at which these actions occur. This task is of great significance for sports analysis, coach support, audience participation, and other aspects. ### Problem Background and Challenges 1. **Lack of large - scale datasets**: Before 2018, publicly available large - scale action localization datasets were very scarce, which hindered the benchmarking and comparison of related methods. To address this challenge, the author team constructed the SoccerNet dataset, which was one of the largest sports video understanding datasets at that time. 2. **Complexity of action localization**: Actions in football games are very complex and diverse, and many fine - grained actions are difficult to accurately identify and localize by traditional methods. For example, the exact time point of a goal may vary depending on different judgment criteria (such as from passing to shooting to the ball entering the net). 3. **Sparsity of actions and discontinuity of consecutive frames**: Most of the game moments do not have significant actions occurring, resulting in the sparsity of labeled data. In addition, there may be cases where adjacent frames are visually similar but represent different actions, increasing the difficulty of model training. ### Solutions To solve the above problems, the author team took the following measures: 1. **Constructing the SoccerNet dataset**: The SoccerNet dataset contains more than 550 complete broadcast game videos and has annotated almost all types of football actions. This dataset provides rich resources for developing automatic action localization methods. 2. **Introducing the action localization task**: The action localization task requires the model to precisely locate the occurrence time of each action using a single timestamp in a long and untrimmed video stream. This is different from the traditional temporal activity localization task, which usually defines the start and end times of an action. 3. **Organizing an annual challenge**: The SoccerNet initiative holds an annual challenge, attracting researchers from all over the world to participate and promoting the development of action localization technology. In the past five years, more than 60 methods have been developed or published, greatly improving the performance of action localization. 4. **Applying deep - learning methods**: The paper explored multiple deep - learning - based action localization methods, including techniques such as feature extraction, feature aggregation, and end - to - end training. These methods use large - scale datasets for training, significantly improving the accuracy and robustness of action localization. ### Summary This paper aims to solve the problem of action localization in football videos by constructing large - scale datasets and introducing new tasks. Through these efforts, action localization has become a feasible and important research direction in the sports industry, and can provide support for team strategy analysis, referee fairness, enhanced broadcast experience, and other fields.