Suspicious activities detection using spatial–temporal features based on vision transformer and recurrent neural network
Saba Hameed,Javaria Amin,Muhammad Almas Anjum,Muhammad Sharif
DOI: https://doi.org/10.1007/s12652-024-04818-7
IF: 3.662
2024-08-21
Journal of Ambient Intelligence and Humanized Computing
Abstract:Nowadays there is growing demand for surveillance applications due to the safety and security from anomalous events. An anomaly in the video is referred to as an event that has some unusual behavior. Although time is required for the recognition of these anomalous events, computerized methods might help to decrease it and perform efficient prediction. However, accurate anomaly detection is still a challenge due to complex background, illumination, variations, and occlusion. To handle these challenges a method is proposed for a vision transformer convolutional recurrent neural network named ViT-CNN-RCNN model for the classification of suspicious activities based on frames and videos. The proposed pre-trained ViT-base-patch16-224-in21k model contains 224 × 224 × 3 video frames as input and converts into a 16 × 16 patch size. The ViT-base-patch16-224-in21k has a patch embedding layer, ViT encoder, and ViT transformer layer having 11 blocks, layer-norm, and ViT pooler. The ViT model is trained on selected learning parameters such as 20 training epochs, and 10 batch-size to categorize the input frames into thirteen different classes such as robbery, fighting, shooting, stealing, shoplifting, Arrest, Arson, Abuse, exploiting, Road Accident, Burglary, and Vandalism. The CNN-RNN sequential model is designed to process sequential data, that contains an input layer, GRU layer, GRU-1 Layer and Dense Layer. This model is trained on optimal hyperparameters such as 32 video frame sizes, 30 training epochs, and 16 batch-size for classification into corresponding class labels. The proposed model is evaluated on UNI-crime and UCF-crime datasets. The experimental outcomes conclude that the proposed approach better performed as compared to recently published works.
computer science, information systems,telecommunications, artificial intelligence