A New Special Video Detection Algorithm Based on 3D Convolution CoHOG and MIL
Wei SONG,Dong REN,Jing YU,Zhen-Guo QI
DOI: https://doi.org/10.11897/SP.J.1016.2019.00149
2019-01-01
Abstract:Existing video content detection algorithms based on gradient direction histogram information are focused on the features extracted from the single two-dimensional video frames, ignored the correlation of the video frames on the time dimension.The frames in the video are inseparable whole.All consecutive frames could express true and complete semantics.The extracted information contained in video is inaccurate if only consider key frames.The correlation contain semantic information of video, is import for video content detection.And the potential symbiotic relationship between local gradient direction features is beneficial to the improvement of the algorithm accuracy.Just as important, pooling used in the adjacent features can reduce high-dimensional feature dimension, avoid losing hidden action information.Constructed 3D Conv-CoHOG feature by using the hidden structure information in video frames on the time dimension, and extending two-dimensional CoHOG features to three-dimensional features.Pooling operation on neighboring features reduced feature dimension effectively.This algorithm solved the problems of recognition accuracy reduction because of the inter-frame information neglect and the high computing complexity caused by high-dimensional features.Mapping video features to instances and bags corresponding to multiple-instance learning, dealing with video content detection problems for different lengths of videos simply.In this article, we introduced field of research and the importance of video violence content detection firstly.Then summarized the achievements of previous research, classified the findings of the research.All algorithms are divided into 3categories, based on multi-modal features of audio and video and fused color feature, based on fusion of different action features, and the content detection algorithm based on neural network and unsupervised feature extraction.The most important part of this article is the introduction of algorithmic structure.We introduced the concept of HOG features and the extraction process, compared the extraction difference between HOG, CoHOG and Conv-CoHOG, also compared the extraction difference between HOG and HOG3D, and proposed the new special video content detection algorithm 3D convolution CoHOG extended from Conv-CoHOG.We compared the difference between the proposed new feature and the old features, such as computational dimension, feature dimension, and the relationship between adjacent features.In part 3.2, we introduced the framework of the new algorithm.In part 3.3to part 3.7, we introduced the construction of feature extraction unit, the quantization of three dimensional gradients, extraction of Co-HOG3D, extraction of Conv-CoHOG3D, and the training of multiple-instance learning algorithm model.In part 4.1, described the two databases used in this experiment.In part 4.2, showed parameter setting and evaluation criteria.Then we analyzed the experimental results.In stage of training data, we used three classifiers, each classifier has a variety of implementations.When testing, compared the results of different features, analyzed the reasons for the different results, and analyzed the effectiveness of the new feature.In the end, we put forward effective solution on special video content detection.The highest detection accuracy on hockey and movie sets illustrated the availability of the proposed new algorithm on the special video detection.3%higher than the existing optimal algorithm on Hockey data set, 0.5%higher than the existing optimal algorithm on Movie data set.