Abstract:Without doubt general video and sound, as found in large multimedia archives, carry emotional information. Thus, audio and video retrieval by certain emotional categories or dimensions could play a central role for tomorrow's intelligent systems, enabling search for movies with a particular mood, computer aided scene and sound design in order to elicit certain emotions in the audience, etc. Yet, the lion's share of research in affective computing is exclusively focusing on signals conveyed by humans, such as affective speech. Uniting the fields of multimedia retrieval and affective computing is believed to lend to a multiplicity of interesting retrieval applications, and at the same time to benefit affective computing research, by moving its methodology "out of the lab" to real-world, diverse data. In this contribution, we address the problem of finding "disturbing" scenes in movies, a scenario that is highly relevant for computer-aided parental guidance. We apply large-scale segmental feature extraction combined with audio-visual classification to the particular task of detecting violence. Our system performs fully data-driven analysis including automatic segmentation. We evaluate the system in terms of mean average precision (MAP) on the official data set of the MediaEval 2012 evaluation campaign's Affect Task, which consists of 18 original Hollywood movies, achieving up to .398 MAP on unseen test data in full realism. An in-depth analysis of the worth of individual features with respect to the target class and the system errors is carried out and reveals the importance of peak-related audio feature extraction and low-level histogram-based video analysis.

RUCMM at MediaEval 2015 Affective Impact of Movies Task: Fusion of Audio and Visual Cues.

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.

Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes.

Emotion Recognition in Videos via Fusing Multimodal Features.

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features.

Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.

Detecting Violent Scenes in Movies by Auditory and Visual Cues.

Affective video retrieval: violence detection in Hollywood movies by large-scale segmental feature extraction

Audiovisual Dependency Attention for Violence Detection in Videos

Benchmarking Violent Scenes Detection in Movies.

Detecting Violence in Video using Subclasses

Violent Video Detection Based on Semantic Correspondence.

A Benchmarking Campaign for the Multimodal Detection of Violent Scenes in Movies

Visual-Audio Emotion Recognition Based on Multi-Task and Ensemble Learning with Multiple Features

GLA in MediaEval 2018 Emotional Impact of Movies Task

Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

Audio-Visual Emotion Recognition Based on Facial Expression and Affective Speech

Look, Listen and Pay More Attention: Fusing Multi-Modal Information for Video Violence Detection

Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition

Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation