Fusing Audio-Words with Visual Features for Pornographic Video Detection

Yizhi Liu,Xiangdong Wang,Yongdong Zhang,Sheng Tang
DOI: https://doi.org/10.1109/trustcom.2011.205
2012-01-01
Journal of Image and Graphics
Abstract:The traditional approach of filtering pornographic videos on the Internet is based on visual features of keyframes. However, it cannot meet users' needs owing to the proliferation of low-resolution videos. To improve the filtering performance, we propose a novel framework of fusing audio-words with visual features for pornographic video detection. Our intention is not only to fuse the two modalities of visual images and audio signals, but also to narrow down the semantic gap between low-level features and high-level concepts by using the mid-level feature "audio-words". To further improve the performance, we present the segmentation algorithm based on units of energy envelope and the decision algorithm based on periodic patterns. The results show that our approach outperforms the traditional one which is based on visual features and achieves satisfactory performance. Moreover, the proposed segmentation algorithm is better than the conventional one using the same length and the proposed decision algorithm exceeds the conventional one using thresholds.
What problem does this paper attempt to address?