1st Place Solution to the 1st SkatingVerse Challenge

Tao Sun,Yuanzi Fu,Kaicheng Yang,Jian Wu,Ziyong Feng
2024-04-22
Abstract:This paper presents the winning solution for the 1st SkatingVerse Challenge. We propose a method that involves several steps. To begin, we leverage the DINO framework to extract the Region of Interest (ROI) and perform precise cropping of the raw video footage. Subsequently, we employ three distinct models, namely Unmasked Teacher, UniformerV2, and InfoGCN, to capture different aspects of the data. By ensembling the prediction results based on logits, our solution attains an impressive leaderboard score of 95.73%.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to accurately analyze the movements in figure skating videos. Specifically, the paper focuses on constructing an algorithm model that can accurately recognize and classify various movements in figure skating competitions. ### Problem Background 1. **Challenge Background**: - This research is to solve the problems in the "1st SkatingVerse Challenge", which is affiliated with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG). - The challenge provides a comprehensive dataset containing 1,687 continuous videos, covering 28 different figure - skating movement categories. - The dataset is divided into 19,993 training video segments and 8,586 test video segments. 2. **Objectives**: - Develop an algorithm that can accurately analyze the movements shown in each video. - Improve the recognition accuracy of figure - skating movements, thereby promoting research and technological progress in related fields. ### Solutions To achieve this goal, the author proposes a multi - step method: 1. **Pre - processing Stage**: - Use the DINO framework for Region of Interest (ROI) extraction and precisely crop the original video. - Extract video frames through FFmpeg and use the DINO framework to detect human bounding boxes in each frame. - Combine the bounding box results of all frames to generate the final detection box, and crop the video based on this. 2. **Model Structure**: - Use three different models (Unmasked Teacher, UniformerV2, and InfoGCN) to capture different aspects of the data. - Fine - tune these models to adapt to specific tasks. - Finally, improve the overall performance by integrating the logits of the model prediction results. 3. **Model Integration**: - Adopt two integration strategies: the voting method and the weighted aggregation method. - The integrated model scored 95.73% on the leaderboard, significantly outperforming the performance of a single model. ### Results Through the above methods, the author successfully improved the recognition accuracy of figure - skating movements, achieving a high leaderboard score, proving the effectiveness of this method. ### Formula Representation When evaluating the performance of the model, the following formula is used to calculate the average accuracy: \[ \text{Mean} = \frac{1}{l} \sum_{i = 1}^{l} \frac{M_i}{N_i} \] where: - \( l \) is the number of categories. - \( M_i \) is the number of correctly predicted samples in the \( i \)-th category. - \( N_i \) is the total number of samples in the \( i \)-th category. This formula ensures a fair evaluation of the accuracy of different categories.