VideoBadminton: A Video Dataset for Badminton Action Recognition

Qi Li,Tzu-Chen Chiu,Hsiang-Wei Huang,Min-Te Sun,Wei-Shinn Ku
2024-03-19
Abstract:In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper introduces a video dataset called VideoBadminton, specifically designed for badminton action recognition. With the continuous development of computer vision, especially the application of deep learning, 3D convolutional networks, Transformers, and other technologies, progress has been made in some benchmark tests. However, challenges still exist in real-world sports analysis, such as accurate activity decomposition and fine-grained action distinction. Existing video datasets such as UCF101, HMDB51, and Kinetics provide diverse scenes, but the demand for recognizing fine-grained actions is increasing. The VideoBadminton dataset consists of high-quality badminton match videos and aims to advance action recognition research, particularly in the field of badminton. This dataset is not only suitable for badminton action recognition but also provides data for identifying fine-grained actions. Insights gained from the evaluation of these approaches can promote further research on action understanding, especially in sports contexts. The paper provides a detailed description of the dataset creation process, including recording, preprocessing, manual annotation, and expert verification. Compared to other datasets, VideoBadminton includes 18 badminton action categories, totaling 7,822 segments and covering 145 minutes of self-recorded videos. It provides detailed annotations of player positions and badminton trajectories, making it the most comprehensive dataset for badminton video analysis. In addition, the paper discusses existing action recognition models, from basic models based on CNN and RNN to more complex models such as 3D CNN and Transformers. The introduction of the VideoBadminton dataset provides a new benchmark for these models to test their performance and applicability in badminton action recognition. Overall, the main goal of this paper is to promote research in badminton action recognition through the creation and evaluation of the VideoBadminton dataset, providing valuable data resources for finer-grained action recognition.