Violent video detection based on MoSIFT feature and sparse coding

Long Xu,Chen Gong,Jie Yang,Qiang Wu,Lixiu Yao
DOI: https://doi.org/10.1109/ICASSP.2014.6854259
2014-01-01
ICASSP
Abstract:To detect violence in a video, a common video description method is to apply local spatio-temporal description on the query video. Then, the low-level description is further summarized onto the high-level feature based on Bag-of-Words (BoW) model. However, traditional spatio-temporal descriptors are not discriminative enough. Moreover, BoW model roughly assigns each feature vector to only one visual word, therefore inevitably causing quantization error. To tackle the constrains, this paper employs Motion SIFT (MoSIFT) algorithm to extract the low-level description of a query video. To eliminate the feature noise, Kernel Density Estimation (KDE) is exploited for feature selection on the MoSIFT descriptor. In order to obtain the highly discriminative video feature, this paper adopts sparse coding scheme to further process the selected MoSIFTs. Encouraging experimental results are obtained based on two challenging datasets which record both crowded scenes and non-crowded scenes.
What problem does this paper attempt to address?