Visual-aural attention modeling for talk show video highlight detection

Yijia Zheng,Guangyu Zhu,Shuqiang Jiang,Qingming Huang,Wen Gao
DOI: https://doi.org/10.1109/ICASSP.2008.4518084
2008-01-01
ICASSP
Abstract:In this paper, we propose a visual-aural attention modeling based video content analysis approach, which can be used to automatically detect the highlights of the popular TV program - talk show video. First, the visual and aural affective features are extracted to represent and model the human attention of highlight. For efficiency consideration, the adopted affective features are kept as few as possible. Then, a specific fusion strategy called ordinal-decision is used to combine the visual, aural attention models and form the attention curve for a video. This curve can reflect the change of human attention while watching TV. Finally, highlight segments are located at the peaks of the attention curve. Moreover, sentence boundary detection is used to refine the highlight boundaries in order to keep the segments' integrality and fluency. This framework is extensible and flexible in integrating more affective features with a variety of fusion schemes. Experimental results demonstrate our proposed visual-aural attention analysis approach is effective for talk show video highlight detection.
What problem does this paper attempt to address?