Saliency Prediction of Sports Videos: A Large-Scale Database and a Self-Adaptive Approach

Minglang Qiao,Mai Xu,Shijie Wen,Lai Jiang,Shengxi Li,Tao Xu,Yunjin Chen,Leonid Sigal
DOI: https://doi.org/10.1109/icassp48485.2024.10446481
2024-01-01
Abstract:Predicting video saliency is crucial for improving sports video processing efficiency, thereby providing an enriched viewing experience for a wide-ranging audience. However, there is a long-term absence of well-established eye-tracking database and learning-based approach, particularly tailored for sports videos. In this paper, we establish a large-scale eye-tracking database dubbed audio-visual sports (AVS). AVS consists of 1,000 high-quality sports videos with eye fixations from 60 participants. Through the data analysis on AVS, we observe that human attention patterns exhibit significant variations based on the specific scene context of the sports. Motivated by this, we propose a sport-aware audiovisual saliency model, which can adaptively learn the scene context in a hyper manner. Specifically, a new audio-visual fusion (AVF) block is developed to effectively fuse features from the visual and audio backbone. After that, a hyper network is introduced to learn sport-aware priors, which are then adopted to guide the self-adaptive saliency predictor for predicting saliency map. Experimental results demonstrate that our approach outperforms other state-of-the-art saliency prediction models over the only two sports video eye-tracking databases.
What problem does this paper attempt to address?