Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.

Qi Dai,Zuxuan Wu,Yu-Gang Jiang,Xiangyang Xue,Jinhui Tang
2014-01-01
Abstract:The Violent Scenes Detection task aims at evaluating algorithms that automatically localize violent segments in both Hollywood movies and short web videos. The definition of violence is subjective: “the segments that one would not let an 8 years old child see in a movie because they contain physical violence”. This is a highly challenging problem because of the strong content variations among the positive instances. In this year’s evaluation, we adopted our recently proposed classification method to fuse multiple features using Deep Neural Networks (DNN). The method was named regularized DNN. We extracted a set of visual and audio features, which have been observed useful. We then applied the regularized DNN for feature fusion and classification. Results indicate that using multiple features is still very helpful, and more importantly, our proposed regularized DNN offers significantly better results than the popular SVM. We achieved a mean average precision of 0.63 for the main task and 0.60 for the generalization task. 1. SYSTEM DESCRIPTION Figure 1 gives an overview of our system. In this short paper, we briefly describe each of the key components. For the task definition, data and evaluation metric, interested readers may refer to [1].
What problem does this paper attempt to address?