Abstract:Surveillance video analysis using automated AI-based techniques is a prominent research field with real-world applications. Several techniques aiming at recognizing activities, behaviour, and violent actions are present in literature. Real-world data analysis for violence detection is still a major challenge due to the limited datasets available for training with complex scenarios and varied scaling of objects performing different activities. In this paper, we present a novel deep learning model considering specially designed frame encoders for spatial feature extraction that are generalized towards many challenges, such as light conditions and indoor and outdoor scenarios. Furthermore, the spatial features in the stacked form are analyzed using a temporal deep learning model to observe and learn the temporal patterns dependencies by considering past and future information while predicting the violent or normal class. In the literature, violence is considered a binary classification of either fight or no fight. Different from these techniques, we present a multi-class classification of violent activities considering different types of human violence, such as assault, shooting, etc. The data for multi-class violence classification is extracted from the famous real-world anomaly detection UCF-crime dataset, where only six human-involved types of anomalies are considered for preparing training and testing sets. We report 41.1%, 48.85% accuracy for 16 and 8 frames in a single sequence. This data shows that more research is demanded to increase the robustness of deep models in violence detection. Finally, the proposed results against recent methods over violence detection datasets are marginally better, indicating the effectiveness of the proposed feature extraction and temporal learning mechanism.

Deep Learning for Activity Recognition Using Audio and Video

Audio-visual voice activity detection using diffusion maps

Human Action Recognition Using Deep Learning Methods.

Enhancing Human Action Recognition and Violence Detection Through Deep Learning Audiovisual Fusion

Literature Review of Deep-Learning-Based Detection of Violence in Video

Human Action Recognition From Digital Videos Based on Deep Learning.

Deep Learning-Based Human Action Recognition in Videos

Video-Based Human Activity Recognition Using Deep Learning Approaches

Fine-Grained Classroom Activity Detection from Audio with Neural Networks

Deep Neural Networks in Video Human Action Recognition: A Review

Video Dynamics Detection Using Deep Neural Networks

Deep Learning Methods for Human Behavior Recognition.

DeepSafety: Multi-level Audio-Text Feature Extraction and Fusion Approach for Violence Detection in Conversations

Towards Real-world Violence Recognition via Efficient Deep Features and Sequential Patterns Analysis

A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

A Review of Deep Learning-based Human Activity Recognition on Benchmark Video Datasets

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Violence detection in videos using deep recurrent and convolutional neural networks

Human activity recognition using deep learning approaches and single frame cnn and convolutional lstm

A Comparative Analysis of Hybrid Deep Learning Models for Human Activity Recognition