Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval

Shahla Nemati,Ahmad Reza Naghsh-Nilchi
DOI: https://doi.org/10.1109/pria.2017.7983051
2017-04-01
Abstract:Developing techniques to retrieve video contents with regard to their impact on viewers' emotions is the main goal of affective video retrieval systems. Existing systems mainly apply a multimodal approach that fuses information from different modalities to specify the affect category. In this paper, the effect of exploiting two types of textual information to enrich the audio-visual content of music video is evaluated; subtitles or songs' lyrics and texts obtained from viewers' comments in video sharing websites. In order to specify the emotional content of texts, an unsupervised lexicon-based method is applied. This method does not need any human-coded corpus for training and is much faster than supervised approach. In order to integrate these modalities, a new information fusion method is proposed based on the Dempster-Shafer theory of evidence. Experiments are conducted on the video clips of DEAP dataset and their associated viewers' comments on YouTube. Results show that incorporating songs' lyrics with the audio-visual content has no positive effect on the retrieval performance, whereas exploiting viewers' comments significantly improves the affective retrieval system. This could be justified by the fact that viewers' affective responses depend not only on the video itself but also on its context.
What problem does this paper attempt to address?