Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features.

Guankun Mu,Haibing Cao,Qin Jin
DOI: https://doi.org/10.1007/978-981-10-3005-5_37
2016-01-01
Abstract:Violent scene detection (VSD) in videos has practical significance in various applications, such as film rating and child protection against violent behavior. Most of previous VSD systems have mainly used visual cues in the video although acoustic or audio cues can also help to detect violent scenes especially when visual cues are not reliable. In this paper, we focus on exploring acoustic information for violent scene detection. Convolutional Neural Networks (CNNs) have achieved the state-of-the-art performance in visual content processing tasks. We therefore investigate using CNNs for violent scene detection based on acoustic information in videos. We apply CNNs in two ways: as a classifier directly or as a deep acoustic feature extractor. Experimental results on the MediaEval 2015 evaluation dataset show that CNNs are effective both as classifiers and as acoustic feature extractors. Furthermore, fusion of acoustic and visual information significantly improves violent scene detection performance.
What problem does this paper attempt to address?