A mid-level representation framework for semantic sports video analysis.

Ling-Yu Duan,Min Xu,Tat-Seng Chua,Qi Tian,Changsheng Xu
DOI: https://doi.org/10.1145/957013.957020
2003-01-01
Abstract:ABSTRACTSports video has been widely studied due to its tremendous commercial potentials. Despite encouraging results from various specific sports games, it is almost impossible to extend a system for a new sports game because they usually employ different sets of low-level features appropriate for the specific games and closely coupled with the use of game specific rules to detect events or highlights. There is a lack of internal representation and structure to be generic and applicable for many different sports. In this paper, we present a generic mid-level representation framework for semantic sports video analysis. The mid-level representation layer is introduced between the low-level audio-visual processing and high-level semantic analysis. It allows us to separate sports specific knowledge and rules from the low-level and mid-level feature extraction. This makes sports video analysis more efficient, effective, and less ad-hoc for various types of sports. To achieve robustness of the low-level feature analysis, a non-parametric clustering, mean shift procedure, has been successfully applied to both color and motion analysis. The proposed framework has been tested for five field-ball type sports covering duration of about 8 hours. Experiments have shown its robust performance in semantic analysis and event detection. We believe that the proposed mid-level representation framework can be used for event detection, highlight extraction, summarization and personalization of many types of sports video.
What problem does this paper attempt to address?