Abstract:Audio-based pornographic detection enables efficient adult content filtering without sacrificing performance by exploiting distinct spectral characteristics. To improve it, we explore pornographic sound modeling based on different neural architectures and acoustic features. We find that CNN trained on log mel spectrogram achieves the best performance on Pornography-800 dataset. Our experiment results also show that log mel spectrogram allows better representations for the models to recognize pornographic sounds. Finally, to classify whole audio waveforms rather than segments, we employ voting segment-to-audio technique that yields the best audio-level detection results.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to detect pornographic sounds in adult videos through neural networks in order to achieve efficient and accurate adult content filtering. Specifically, the authors explored pornographic sound modeling methods based on different neural network architectures and acoustic features, aiming to improve the performance of pornographic content detection at the audio level. ### Main problems: 1. **Limitations of visual methods**: Most of the existing automatic adult video detection relies on visual classification techniques. These methods are easily affected by image quality (such as lighting, blurring, etc.), and require a large amount of computing resources and storage space. 2. **Advantages and disadvantages of audio methods**: In contrast, audio - based methods have lower computing and storage requirements when dealing with pornographic content detection, and can use unique spectral features to distinguish between pornographic and non - pornographic audio. However, there is relatively little research in this area, especially in the application of deep learning. ### Research objectives: - Explore and compare the performance of different neural network architectures (such as fully - connected neural network FFNN and convolutional neural network CNN) in pornographic sound detection. - Evaluate the impact of different acoustic features (such as MFCCs and log - mel spectrograms) on model performance. - Propose and evaluate methods for converting segment - level predictions to audio - level predictions to achieve more effective overall audio classification. ### Core contributions: - It was found that the CNN trained with log - mel spectrograms achieved the best segment - level and audio - level detection performance on the Pornography - 800 dataset. - The voting method was proposed as the most effective segment - to - audio prediction conversion method, which further improved the accuracy of audio - level detection. ### Formula representation: The formulas involved in the paper mainly include those for audio feature extraction and model training. For example: - The calculation formula for the log - mel spectrogram: \[ S_{\text{log - mel}}=\log(1 + 1000\cdot|STFT(x)|^{2}) \] where \(x\) is the audio signal and \(STFT(x)\) is the result of the short - time Fourier transform. - The binary cross - entropy loss function used in the model training process: \[ L(y,\hat{y})=-\frac{1}{N}\sum_{i = 1}^{N}\left[y_{i}\log(\hat{y}_{i})+(1 - y_{i})\log(1 - \hat{y}_{i})\right] \] where \(y\) is the true label, \(\hat{y}\) is the predicted probability, and \(N\) is the number of samples. Through these methods and analyses, the authors have successfully improved the performance of pornographic audio detection and provided valuable references for future research.

What Did I Just Hear? Detecting Pornographic Sounds in Adult Videos Using Neural Networks

Sensitive Information Detection Based on Deep Learning Models

Fusing Audio-Words with Visual Features for Pornographic Video Detection

Multi Frame Obscene Video Detection with ViT

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.

A Mid-level Video Representation based on Binary Descriptors: A Case Study for Pornography Detection

Applying deep learning to classify pornographic images and videos

Analyzing Periodicity and Saliency for Adult Video Detection

Pornprobe: an LDA-SVM based pornography detection system.

Method and system for detecting network pornography videos in real time

Audio Recognition using Mel Spectrograms and Convolution Neural Networks

Multilevel fusion of multimodal deep features for porn streamer recognition in live video

A Novel Scheme for Intelligent Recognition of Pornographic Images

Neural Moderation of ASMR Erotica Content in Social Networks

Evaluating Performance of an Adult Pornography Classifier for Child Sexual Abuse Detection

Pornographic image detection based on visual words and semantic projection

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

MelCochleaGram-DeepCNN: Sequentially Fused Spectrogram and the DeepCNN Classifiers-based Audio Spoof Detection System

Classification and interaction of new media instant music video based on deep learning under the background of artificial intelligence

A lightweight feature extraction technique for deepfake audio detection