Sports highlights generation based on acoustic events detection: A rugby case study

Anant Baijal,Jaeyoun Cho,Woojung Lee,Byeong-Seob Ko
DOI: https://doi.org/10.1109/ICCE.2015.7066303
2015-09-18
Abstract:We approach the challenging problem of generating highlights from sports broadcasts utilizing audio information only. A language-independent, multi-stage classification approach is employed for detection of key acoustic events which then act as a platform for summarization of highlight scenes. Objective results and human experience indicate that our system is highly efficient.
Sound,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of how to automatically generate highlights from rugby match broadcasts using only audio information. Specifically, the paper proposes a multi-stage classification method to detect key acoustic events and use these events to generate match highlights. ### Background of the Paper - **Research Field**: Sports video summarization or sports highlight detection. - **Existing Research**: Many studies have used different types of cues such as audio, visual, and text to generate sports highlights, but there is relatively little research focused on rugby. - **Challenges**: - Scoring events in rugby (such as "touchdowns") do not produce obvious acoustic features. - There is random noise in the broadcast content (such as music, shouting, etc.). - The audience is cheering most of the time, with little variation in acoustic features. - The acoustic environment and attributes of different matches vary greatly. ### Research Objectives - **Acoustic Event Detection**: Achieve high recall rate and low precision error in acoustic event detection. - **Highlight Generation**: Use detected acoustic events to generate highlight scenes. ### Methodology 1. **Training Phase**: - **Data Annotation**: Identify key acoustic events such as referee whistles and excited speech from commentators. - **Preprocessing and Feature Extraction**: Convert input audio signals to the spectral domain, extract Mel-frequency cepstral coefficients (MFCC) and their first-order differential coefficients (delta-MFCC). - **Learning**: Use Gaussian Mixture Model (GMM) to learn the extracted features. 2. **Highlight Generation Engine**: - **Preprocessing and Feature Extraction**: Same as in the training phase. - **Multi-stage Classification**: First classify audio frames as "speech" or "non-speech" events, then further classify them as "excited speech," "non-excited speech," "whistle," or other events. - **Post-processing**: Determine the start and end points of highlight scenes through a sliding window, ensuring the generated highlight scenes are smooth and complete. ### Experimental Results - **Objective Evaluation**: The method shows very high recall and precision rates in detecting key acoustic events. - **User Experience**: 11 subjects evaluated the generated highlight scenes, with an average opinion score (MOS) of 4.23, indicating a very positive user experience. ### Conclusion The proposed method can effectively generate highlight scenes from rugby match broadcasts with high recall and precision rates. This method can be embedded in consumer electronic devices and is suitable for both online (live TV broadcasts) and offline (stored sports multimedia) scenarios.