Abstract:Recently, surveillance cameras are deployed in many public places to monitor human activities. Detecting violence in videos through automatic analysis means significant for law enforcement. But almost many monitoring systems require to manually identify violent scenes in the video which leads to slow response. However, violence detection is a challenging problem because of the broad definition of violence. In this work, we will concern with physical violence that involved two persons or more. This work proposed a novel method to detect violence using automated mobile neural architecture search network and convolution long short-term-memory to extract spatiotemporal features in the video, and then adding two types of pooling layers max and average pooling to capture richer features, standard scaling these features and reducing the dimension using linear discriminative analysis to remove redundant features, and making classifier algorithms working well in low dimension. For classification, we trained and tested various machine learning models which are random forest, support vector machine (SVM), and K-nearest neighbor classifiers. We develop a combined dataset that contains violence and non-violence scenes from public datasets: hockey, movie, and violent flow. The performance of the proposed method is evaluated on a combined dataset in addition to three benchmark datasets, hockey, movie, and violent flow datasets in terms of detection accuracy. The results of our model showed high performance in combined, movie, and violent flow datasets using SVM classifier with accuracies of 97.5%, 100%, and 96%, respectively, whereas in the hockey dataset, we achieve the best result of 99.3% using the random forest classifier.

Violent Scene Detection Using Convolutional Neural Networks and Deep Audio Features.

Utterance-Based Audio Sentiment Analysis Learned by a Parallel Combination of CNN and LSTM.

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.

Violent Video Detection Based on Semantic Correspondence.

Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning.

Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks.

Detecting Violent Scenes in Movies by Auditory and Visual Cues.

Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision

Violent Video Recognition Based on Global-Local Visual and Audio Contrastive Learning

Learning to Detect Violent Videos using Convolutional Long Short-Term Memory

Audio-Visual Keyword Spotting Based on Multidimensional Convolutional Neural Network

Audiovisual Dependency Attention for Violence Detection in Videos

A convolutional neural network approach for acoustic scene classification

Detecting Violence in Video Based on Deep Features Fusion Technique

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Learning Weakly Supervised Audio-Visual Violence Detection in Hyperbolic Space

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features.

Voice Presentation Attack Detection Using Convolutional Neural Networks

Violence detection in videos using deep recurrent and convolutional neural networks

Research on Abnormal Audio Event Detection Based on Convolutional Neural Networks

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection