RUCMM at MediaEval 2015 Affective Impact of Movies Task: Fusion of Audio and Visual Cues.

Qin Jin,Xirong Li,Haibing Cao,Yujia Huo,Shuai Liao,Gang Yang,Jieping Xu
2015-01-01
Abstract:This paper summarizes our eorts for the rst time participation in the Violent Scene Detection subtask of the MediaEval 2015 Aective Impact of Movies Task. We build violent scene detectors using both audio and visual cues. In particular, the audio cue is represented by bag-of-audio-words with sher vector encoding. The visual cue is exploited by extracting CNN features from video frames. The detectors are implemented using two-class linear SVM classiers. Evaluation shows that the audio detectors and the visual detectors are comparable and complementary to each other. Among our submissions, multi-modal late fusion leads to the best performance.
What problem does this paper attempt to address?